1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
7 <chapter id="sg-manual"
8 xreflabel="SGCheck: an experimental stack and global array overrun detector">
9 <title>SGCheck: an experimental stack and global array overrun detector</title>
11 <para>To use this tool, you must specify
12 <option>--tool=exp-sgcheck</option> on the Valgrind
18 <sect1 id="sg-manual.overview" xreflabel="Overview">
19 <title>Overview</title>
21 <para>SGCheck is a tool for finding overruns of stack and global
22 arrays. It works by using a heuristic approach derived from an
23 observation about the likely forms of stack and global array accesses.
31 <sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
32 <title>SGCheck Command-line Options</title>
34 <para>There are no SGCheck-specific command-line options at present.</para>
36 <para>SGCheck-specific command-line options are:</para>
39 <variablelist id="sg.opts.list">
47 <sect1 id="sg-manual.how-works.sg-checks"
48 xreflabel="How SGCheck Works">
49 <title>How SGCheck Works</title>
51 <para>When a source file is compiled
52 with <option>-g</option>, the compiler attaches DWARF3
53 debugging information which describes the location of all stack and
54 global arrays in the file.</para>
56 <para>Checking of accesses to such arrays would then be relatively
57 simple, if the compiler could also tell us which array (if any) each
58 memory referencing instruction was supposed to access. Unfortunately
59 the DWARF3 debugging format does not provide a way to represent such
60 information, so we have to resort to a heuristic technique to
61 approximate it. The key observation is that
63 if a memory referencing instruction accesses inside a stack or
64 global array once, then it is highly likely to always access that
65 same array</emphasis>.</para>
67 <para>To see how this might be useful, consider the following buggy
69 <programlisting><![CDATA[
70 { int i, a[10]; // both are auto vars
71 for (i = 0; i <= 10; i++)
76 <para>At run time we will know the precise address
77 of <computeroutput>a[]</computeroutput> on the stack, and so we can
78 observe that the first store resulting from <computeroutput>a[i] =
79 42</computeroutput> writes <computeroutput>a[]</computeroutput>, and
80 we will (correctly) assume that that instruction is intended always to
81 access <computeroutput>a[]</computeroutput>. Then, on the 11th
82 iteration, it accesses somewhere else, possibly a different local,
83 possibly an un-accounted for area of the stack (eg, spill slot), so
84 SGCheck reports an error.</para>
86 <para>There is an important caveat.</para>
88 <para>Imagine a function such as <function>memcpy</function>, which is used
89 to read and write many different areas of memory over the lifetime of the
90 program. If we insist that the read and write instructions in its memory
91 copying loop only ever access one particular stack or global variable, we
92 will be flooded with errors resulting from calls to
93 <function>memcpy</function>.</para>
95 <para>To avoid this problem, SGCheck instantiates fresh likely-target
96 records for each entry to a function, and discards them on exit. This
97 allows detection of cases where (e.g.) <function>memcpy</function>
98 overflows its source or destination buffers for any specific call, but
99 does not carry any restriction from one call to the next. Indeed,
100 multiple threads may make multiple simultaneous calls to
101 (e.g.) <function>memcpy</function> without mutual interference.</para>
108 <sect1 id="sg-manual.cmp-w-memcheck"
109 xreflabel="Comparison with Memcheck">
110 <title>Comparison with Memcheck</title>
112 <para>SGCheck and Memcheck are complementary: their capabilities do
113 not overlap. Memcheck performs bounds checks and use-after-free
114 checks for heap arrays. It also finds uses of uninitialised values
115 created by heap or stack allocations. But it does not perform bounds
116 checking for stack or global arrays.</para>
118 <para>SGCheck, on the other hand, does do bounds checking for stack or
119 global arrays, but it doesn't do anything else.</para>
127 <sect1 id="sg-manual.limitations"
128 xreflabel="Limitations">
129 <title>Limitations</title>
131 <para>This is an experimental tool, which relies rather too heavily on some
132 not-as-robust-as-I-would-like assumptions on the behaviour of correct
133 programs. There are a number of limitations which you should be aware
139 <para>False negatives (missed errors): it follows from the
140 description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
141 that the first access by a memory referencing instruction to a
142 stack or global array creates an association between that
143 instruction and the array, which is checked on subsequent accesses
144 by that instruction, until the containing function exits. Hence,
145 the first access by an instruction to an array (in any given
146 function instantiation) is not checked for overrun, since SGCheck
147 uses that as the "example" of how subsequent accesses should
152 <para>False positives (false errors): similarly, and more serious,
153 it is clearly possible to write legitimate pieces of code which
154 break the basic assumption upon which the checking algorithm
155 depends. For example:</para>
157 <programlisting><![CDATA[
158 { int a[10], b[10], *p, i;
159 for (i = 0; i < 10; i++) {
160 p = /* arbitrary condition */ ? &a[i] : &b[i];
166 <para>In this case the store sometimes
167 accesses <computeroutput>a[]</computeroutput> and
168 sometimes <computeroutput>b[]</computeroutput>, but in no cases is
169 the addressed array overrun. Nevertheless the change in target
170 will cause an error to be reported.</para>
172 <para>It is hard to see how to get around this problem. The only
173 mitigating factor is that such constructions appear very rare, at
174 least judging from the results using the tool so far. Such a
175 construction appears only once in the Valgrind sources (running
176 Valgrind on Valgrind) and perhaps two or three times for a start
177 and exit of Firefox. The best that can be done is to suppress the
182 <para>Performance: SGCheck has to read all of
183 the DWARF3 type and variable information on the executable and its
184 shared objects. This is computationally expensive and makes
185 startup quite slow. You can expect debuginfo reading time to be in
186 the region of a minute for an OpenOffice sized application, on a
187 2.4 GHz Core 2 machine. Reading this information also requires a
188 lot of memory. To make it viable, SGCheck goes to considerable
189 trouble to compress the in-memory representation of the DWARF3
190 data, which is why the process of reading it appears slow.</para>
194 <para>Performance: SGCheck runs slower than Memcheck. This is
195 partly due to a lack of tuning, but partly due to algorithmic
197 stack and global checks can sometimes require a number of range
198 checks per memory access, and these are difficult to short-circuit,
199 despite considerable efforts having been made. A
200 redesign and reimplementation could potentially make it much faster.
205 <para>Coverage: Stack and global checking is fragile. If a shared
206 object does not have debug information attached, then SGCheck will
207 not be able to determine the bounds of any stack or global arrays
208 defined within that shared object, and so will not be able to check
209 accesses to them. This is true even when those arrays are accessed
210 from some other shared object which was compiled with debug
213 <para>At the moment SGCheck accepts objects lacking debuginfo
214 without comment. This is dangerous as it causes SGCheck to
215 silently skip stack and global checking for such objects. It would
216 be better to print a warning in such circumstances.</para>
220 <para>Coverage: SGCheck does not check whether the the areas read
221 or written by system calls do overrun stack or global arrays. This
222 would be easy to add.</para>
226 <para>Platforms: the stack/global checks won't work properly on
227 PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
228 That's because the stack and global checking requires tracking
229 function calls and exits reliably, and there's no obvious way to do
230 it on ABIs that use a link register for function returns.
235 <para>Robustness: related to the previous point. Function
236 call/exit tracking for X86 and AMD64 is believed to work properly
237 even in the presence of longjmps within the same stack (although
238 this has not been tested). However, code which switches stacks is
239 likely to cause breakage/chaos.</para>
249 <sect1 id="sg-manual.todo-user-visible"
250 xreflabel="Still To Do: User-visible Functionality">
251 <title>Still To Do: User-visible Functionality</title>
256 <para>Extend system call checking to work on stack and global arrays.</para>
260 <para>Print a warning if a shared object does not have debug info
261 attached, or if, for whatever reason, debug info could not be
262 found, or read.</para>
266 <para>Add some heuristic filtering that removes obvious false
267 positives. This would be easy to do. For example, an access
268 transition from a heap to a stack object almost certainly isn't a
269 bug and so should not be reported to the user.</para>
279 <sect1 id="sg-manual.todo-implementation"
280 xreflabel="Still To Do: Implementation Tidying">
281 <title>Still To Do: Implementation Tidying</title>
283 <para>Items marked CRITICAL are considered important for correctness:
284 non-fixage of them is liable to lead to crashes or assertion failures
290 <para> sg_main.c: Redesign and reimplement the basic checking
291 algorithm. It could be done much faster than it is -- the current
292 implementation isn't very good.
297 <para> sg_main.c: Improve the performance of the stack / global
298 checks by doing some up-front filtering to ignore references in
299 areas which "obviously" can't be stack or globals. This will
300 require using information that m_aspacemgr knows about the address
305 <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
306 for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
307 targets, so this is a bit academic at the moment).</para>