Tuesday, October 14, 2014

gem5: Instructions for Enabling Full System Memory Tracing

  1. Setup your proxy.
    • export http_proxy=http://myproxy.myserver.com:1234
  2. Clone the gem5-stable repo.
    •  hg clone http://repo.gem5.org/gem5-stable
  3. Install gem5's dependencies (http://www.m5sim.org/Dependencies).
    • Note that on CentOS 6, the standard repos don't provide a new enough version of SWIG. I was able to download SWIG from the SWIG website and build it.
    • On CentOS 6, the protobuf package doesn't include gzip_stream.h, which gem5 will ask for if you want to do memory tracing. I posted on StackOverflow about this problem, which eventually led me to the EPEL bug-tracking system. A bug was filed for this problem over two years ago, but it hasn't been fixed yet. To get around this problem, I downloaded protobuf from Google and built it.
  4. Add the bin directories for SWIG and protobuf builds to your PATH. 
  5. Add the lib directory for protobuf to your LIBRARY_PATH (note that this is not LD_LIBRARY_PATH).
  6. Apply this patch to make necessary modifications to code: link.
    • Copy it into the gem5-stable/ directory and run "hg import memTraceExample.patch".
    • You will need to modify the path given on line 416 to point to your protobuf include directory:
main.Append(CPPPATH=[Dir('<path to protobuf install directory>/include/')])
    • This patch adds the necessary modifications to insert CommMonitors to track all accesses to L1, to main memory, and to the data TLB. Because the L1 log will grow rapidly, for longer simulations, you will want the option to turn this off. To do so, you can edit configs/common/CacheConfig.py and change the call to the CommMonitor constructor for the L1 to make trace_enable=False.
    • This patch also modifies the code that does virtual to physical memory translation so that is prints the mapping of a virtual to physical address when a new page is allocated. By default, the code doesn't print the mapping in this case.
  1. Test in system call emulation mode.
    • This won't be doing full system simulation. Although in the end we want full system simulation, doing a run in SE mode will verify that we have things working up to this point. Run:
./build/X86/gem5.opt --debug-flags=MMU --debug-file=mmu_trace.log ./configs/example/se.py --cpu-type=timing --caches --l2cache -c tests/test-progs/hello/bin/x86/linux/hello 
    • The "--debug-flags=MMU --debug-file=mmu_trace.log" options will create the debug file "mmu_trace.log" in the m5out/ directory which logs the virtual to physical address translations.
    • The trace files CT_mon{1,2,3}.trz.gz in the directory m5out. The debug file shows the mappings of virtual to physical addresses, while the other files show the traces of memory access to L1, main memory, and the data TLB.
    • To view the traces of memory accesses in the CT_mon files, you will use a script provided by gem5.
    • To use this script, you will need to setup the python protobuf library. Go to the directory containing the protobuf package you downloaded in the earlier step. Go into the python subdirectory, and follow the steps in the README file.
    • To install to a directory other than the default, run an "export PYTHONPATH=<protobuf install directory>/lib/python2.6/site-packages/". When you get to the step to run "python setup.py install", instead run "python setup.py install --prefix=<protobuf install directory/lib/python2.6/site-packages/>".
    • Go back to you gem5-stable directory. Run the commands:

export PYTHONPATH=$PYTHONPATH:<protobuf-2.5.0 package directory>/python/

./util/decode_packet_trace.py m5out/CT_mon1.trc.gz outputfile.log

    • This should translate the protobuf output log into an ascii file named "outputfile.log" that contains the trace of L1 accesses.
  1. Test in full system mode.
    • To run in full system mode, you will need to download some more files from the gem5 website.
    • First, you'll need to download the full system files and config files for X86. They are available here: http://www.m5sim.org/Download. Extract both of these to a directory where they will live (for our example, we'll assume this directory is /home/user123/gem5-util/). They need to be in a place where gem5 can see them while they are running. When you extract these files, you will get two directories, "binaries" and "disks".
    • In the "disks" directory, there is a file called "linux-x86.img". This needs to be renamed to "x86root.img".
    • gem5 also needs a file called "linux2-bigswap.img" to be in the disks directory. The only place I've found this is in the full system files for the Alpha architecture posted on the gem5 website, here: http://www.m5sim.org/Download. Download the Alpha system files, extract them, and move the file "linux2-bigswap.img" from this package to the "disks" directory containing the file "x86root.img".
    • Now, set the environment variable M5_PATH to point to /home/user123/gem5-util (or wherever you put the "disks" and "config" directories).
    • Run gem5 in full system mode: 
./build/X86/gem5.opt --debug-enable=TLB --debug-file=tlb_trace.log ./configs/example/fs.py --cpu-type=timing --caches --l2cache

    • Here again, the "--debug-enable" and "--debug-file" options will trace virtual to physical address translations. We use the "TLB" option here to log these translations rather than the "MMU" option as we did earlier, because in full system simulation a different class is performing these translations. Note that because you are logging all virtual to physical address translations, the "tlb_trace.log" file will get very large very fast.
    • To mitigate this problem of the exploding "tlb_trace.log" file, I modifed the arch/x86/tlb.cc file to log translations only when page faults occur. Logging these translations will be important when we start executing our own applications. If we want to print out an address and map it back to the memory trace, we'll need to see how these translations are taking place. The modifications to tlb.cc with restricted logging can be obtained through these diff files: tlb.cc, tlb.hh.
    • In another terminal, connect to the gem5 simulation using the m5term utility:
    • CD to the directory util/term/, and run "./m5term 3456". This will connect you to the simulation. After the simulator is done booting the system, you should see a linux command prompt. From this command prompt, you can run standard linux commands. Shut down the simulation and make sure that you got a TLB trace in m5out/tlb_trace.log. 
    • Switch back over to the terminal where the gem5 simulation is running, and hit CTRL-C. This shuts down the simulator. Examine the log file in the top of the gem5-stable directory. This log file will have the name that you chose when you opened the ofstream in src/mem/comm_monitor.cc. The log file should contain a trace of memory accesses performed by the entire system that was simulated in the last run.
  1. Run a custom program in the full system simulation.
    • We will want to run our own custom programs with the gem5 full system simulator. Although we can compile programs from within the simulator, this will be fairly slow and tedious. Instead, we can compile the programs natively, mount the disk image from linux, and copy the compiled program from our native linux into the mounted disk image, where it can be accessed by the simulated system.
    • The instructions for mounting the image are given on the gem5 website here: http://www.m5sim.org/Disk_images#Mounting_an_image. Run:
sudo mount -o loop,offset=32256 /home/user123/gem5-util/disks/x86root.img /home/user123/gem5-util/fsmount/
    • (This assumes you've already created the fsmount directory).
    • Now, within your natively running Linux, write a program - call it test.c, for example - that you want to run in the gem5 simulation. Compile it using gcc, and copy the binary into /home/user123/gem5-util/fsmount/bin/.
    • Restart the gem5 simulation, and you should be able to run the test executable that you placed in the bin directory from within the gem5 simulation.
    • After running your program, you will probably want to be able to look in the memory trace to see how it is using memory. However, if you print out addresses in your program, they will print out as virtual addresses. The memory traces log physical addresses. Hence, you need to use the tlb_trace.log file to find out what mappings take place. Say you printed out some address 0x2ac97c7a8010 used by your program. Look in tlb_trace.log for an address starting with 0x2ac97c7XXXXX to see where the page containing your address was faulted in. This will tell you which physical address your virtual address maps to.
    • Now, using your decode_packet_trace.py python script described in a previous step, translate one of your CT_mon files to ascii. Now, you will search this ascii file for your physical address. However, these addresses are logged in decimal, not hex. So translate your physical address to decimal, and search the file for it. It should be in there. Note that these addresses are going to be accessed on boundaries (as the pages were during page faults). So, for instance, if you are looking at the log for accesses to external memory, and the block size of your last level cache is 64B, you will see in the log an access to the 64B block containing your physical address.


Caveats

  1. Last time I tried, if you copy files into the disk image while gem5 is running, gem5 won't be able to see them. I had to reboot gem5 for it to see the files I copied in. 
  2. I am still working on understanding the memory trace. For example, my log contains long strings of writes to the address 0x20000000000000b0, which I don't yet understand.



Results of Trials with other Tools


Before trying gem5, I tried to perform memory tracing with some other tools. Here are some notes on what I learned from trying those tools.

QEMU: The vCSIMx86 Version

There has been a version of QEMU released with modifications specifically for the purpose of gathering system-wide memory traces, available here. I was able to get it running and gather some traces, but I ran into some difficulties with it and found that it had some significant limitations, including the following:

  1. It works only with a very old version of QEMU (v. 1.1.0).
  2. After booting the virtual machine (which takes quite awhile, even on a machine with fairly good specs), only the first run of an application will successfully initiate tracing.
  3. It isn't well-supported, as there doesn't appear to be a mailing list or other such resource for offering help with it.
  4. I didn't ever gain confidence that it was actually tracing all memory accesses in the system.
  5. Note that I contacted the author, and he confirmed that #1 and #2 are known limitations of the tool.

Monday, October 13, 2014

Autotools with Libtool in Ten Minutes or Less

I've created another version of the package in the previous post that provides file templates for autotools. This version includes the necessary modifications for invoking libtool.

This example is based on the tutorial to which I linked in that post. Note that I found that one modification is necessary to the instructions given in that tutorial. The "autoreconf -fiv" command throws a warning (which will turn into an error in my example, since I have the "-Werror" option set in "configure.ac") if "AM_PROG_AR" does not show up in "configure.ac". Note also that "AM_PROG_AR" must appear before "LT_INIT".

The build steps are the same as in the previous post: 1. Untar the file; 2. Enter the directory; 3. autoreconf -fiv; 4. configure; 5. make.

Note that the file "hello" that make now produces is a script, rather than an executable. To debug this script, execute "libtool --mode=execute gdb hello". This requires that you build with debugging turned on with the "-g" flag. It's also a good idea to turn off optimizations, which you can do with the "-O0" flag. Turn these on when you run configure by running it like so: "./configure CFLAGS='-O0 -g'.

Autotools in Ten Minutes or Less

I've recently been getting more exposure to autotools. I found out quickly that when learning autotools, one can easily get overwhelmed with details that aren't relevant to his purposes. The complexity of it all can easily be observed when one comes across, for example, an introductory tutorial that is 162 pages long. For me, this gets frustrating.

I've found that for 90% of cases, all I need is a set of simple templates of the files containing the basic elements for building a package with autotools. For this reason, I've put together such a template here, which builds a simple "hello world" program. I've also included the components necessary to build a parser using flex and bison, since this is a common need which requires some modifications to the files.

To build the example, simply untar the tarball, enter the directory, type "autoreconf -fiv", "configure", and then "make". And that's it. You can build the distribution tarball by typing "make dist".

To remove the parser from the build, open "src/Makefile.am" and comment out "parser" from the "bin_PROGRAMS" variable, along with following three lines (the lines starting with "parser_SOURCES", "AM_YFLAGS", and "BUILT_SOURCES").

Modifying the files for your own project is straightforward. Add your C files to the "src/" directory. Now, add the name of your executable to build to the "bin_PROGRAMS" variable. Say that the name of your new program is "myprog". Now, create a variable "myprog_SOURCES", and assign to it the list of .c and .h files on which it depends. Finally, rerun "./configure" and "make".

To learn about more of the details of how autotools works so that you can customize the example to suit a broader range of purposes, the best tutorial I've found is the one linked above.

In the future, I might modify this example to include provisions for libtool, which is used widely alongside autotools.



To generate the build system for a directory structure containing your own source files, the following should work in basic cases:

1. In the top-level directory of your project, run
$ autoscan
$ mv configure.scan configure.ac

2. Now edit configure.ac.
a. Change:
AC_INIT([FULL-PACKAGE-NAME], [VERSION], [BUG-REPORT-ADDRESS])
to
AC_INIT([my_package], [0.1], [myemail@organization.com])
or whatever your package name, version, and email address are.

b. For the line:
AC_CONFIG_SRCDIR([src/cpp/my_source.cpp])
Make sure that "src/cpp/my_source.cpp" is a source file in your project.

c. Add the line
AM_INIT_AUTOMAKE([foreign -Wall -Werror])
after the line near the top containing AC_CONFIG_HEADERS

d. Add the lines
AC_CONFIG_FILES([Makefile
src/cpp/Makefile])

With all Makefiles to be built.

e. If the package depends on another library, you can force the user of the configure script to specify the location of this library through the AC_ARG_WITH macro. For example, say the SystemC library is needed. To allow the user to specify the location of SystemC with a "--with-systemc" argument to the configure script, add the lines below. The AC_SUBST macro allows the use of the SYSC_LOC variable in the Makefiles.

AC_ARG_WITH([systemc],
 [AS_HELP_STRING([--with-systemc],
 [Specify where the SystemC libraries is installed.])],
 [sysc_loc=$withval
 AC_SUBST(SYSC_LOC,$sysc_loc)
 AC_DEFINE([HAVE_SYSC],[1],[Define when SYSC is enabled.])],
            [sysc=no])

You can then tell configure check to make sure that the SystemC libraries can be found:

AC_LANG_PUSH(C++)
CPPFLAGS="$CPPFLAGS -I$sysc_loc/include/"
AC_CHECK_HEADERS([systemc.h],
 #[AC_MSG_RESULT()],
 [],
 [AC_MSG_ERROR([Need to specify location of SystemC through --with-systemc.])])
 AC_LANG_POP(C++)

f. add a Makefile.am file to each directory with a Makefile listed above.
Follow the formats of the Makefile.am found in the tarball linked above.
The Makefile.am in the top-level directory usually just points to
the subdirectories containing other Makefile.am files.

3. All files should now be in place to create the build system. Now run:
$ autoreconf -fiv
$ ./configure
$ make