=================== = MemTest-86 v3.0 = =================== Table of Contents ================= 1) Introduction 2) Licensing 3) Installation 4) Serial Port Console 5) Online Commands 6) Memory Sizing 7) Error Display 8) Trouble-shooting Memory Errors 9) Execution Time 10) Memory Testing Philosophy 11) Memtest86 Test Algorithms 12) Individual Test Descriptions 13) Problem Reporting - Contact Information 14) Known Problems 15) Planned Features List 16) Change Log 17) Acknowledgments 1) Introduction =============== Memtest86 is thorough, stand alone memory test for Intel i386 architecture systems. BIOS based memory tests are only a quick check and often miss failures that are detected by Memtest86. For updates go to the Memtest86 web page: http://www.memtest86.com To report problems or provide feedback send email to: cbrady@cray.com 2) Licensing ============ Memtest86 is released under the terms of the Gnu Public License (GPL). Other than the provisions of the GPL there are no restrictions for use, private or commercial. See: http://www.gnu.org/licenses/gpl.html for details. 3) Installation =============== Memtest86 is a stand alone program that cannot be executed under windows and must be loaded from a floppy disk. To install Memtest86: - Extract the files from the zip archive - Open the directory where the files were extracted and click on "install.bat". - The install program will prompt you for the floppy drive and also prompt you to insert a blank floppy. - To run Memtest86 leave the floppy in the drive and reboot. NOTE: After the boot floppy has been created you will not be able to read the floppy from windows. This is normal. 4) Serial Console ================= Memtest86 can be used on PC's equipped with a serial port for the console. By default serial port console support is not enabled since it slows down testing. To enable change the SERIAL_CONSOLE_DEFAULT define in config.h from a zero to a one. The serial console baud rate may also be set in config.h with the SERIAL_BAUD_RATE define. The other serial port settings are no parity, 8 data bits, 1 stop bit. All of the features used by memtest86 are accessible via the serial console. However, the screen sometimes is garbled when the online commands are used. 5) Online Commands ================== Memtest86 has a limited number of online commands. Online commands provide control over caching, test selection, address range and error scrolling. A help bar is displayed at the bottom of the screen listing the available on-line commands. Command Description ESC Exits the test and does a warm restart via the BIOS. c Enters test configuration menu Menu options are: 1) Cache mode 2) Test selection 3) Address Range 4) Memory Sizing 5) Error Summary 6) Error Report Mode 7) ECC Mode 8) Restart Test 9) Reprint Screen SP Set scroll lock (Stops scrolling of error messages) Note: Testing is stalled when the scroll lock is set and the scroll region is full. CR Clear scroll lock (Enables error message scrolling) 6) Memory Sizing ================ The BIOS in modern PC's will often reserve several sections of memory for it's use and also to communicate information to the operating system (ie. ACPI tables). It is just as important to test these reserved memory blocks as it is for the remainder of memory. For proper operation all of memory needs to function properly regardless of what the eventual use is. For this reason Memtest86 has been designed to test as much memory as is possible. However, safely and reliably detecting all of the available memory has been problematic. Versions of Memtest86 prior to v2.9 would probe to find where memory is. This works for the vast majority of motherboards but is not 100% reliable. Sometimes the memory size is incorrect and worse probing the wrong places can in some cases cause the test to hang or crash. Starting in version 2.9 alternative methods are available for determining the memory size. By default the test attempts to get the memory size from the BIOS using the "e820" method. With "e820" the BIOS provides a table of memory segments and identifies what they will be used for. By default Memtest86 will test all of the ram marked as available and also the area reserved for the ACPI tables. This is safe since the test does not use the ACPI tables and the "e820" specifications state that this memory may be reused after the tables have been copied. Although this is a safe default some memory will not be tested. Two additional options are available through online configuration options. The first option (BIOS-All) also uses the "e820" method to obtain a memory map. However, when this option is selected all of the reserved memory segments are tested, regardless of what their intended use is. The only exception is memory segments that begin above 3gb. Testing has shown that these segments are typically not safe to test. The BIOS-All option is more thorough but could be unstable with some motherboards. The second option for memory sizing is the traditional "Probe" method. This is a very thorough but not entirely safe method. In the majority of cases the BIOS-All and Probe methods will return the same memory map. For older BIOS's that do not support the "e820" method there are two additional methods (e801 and e88) for getting the memory size from the BIOS. These methods only provide the amount of extended memory that is available, not a memory table. When the e801 and e88 methods are used the BIOS-All option will not be available. The MemMap field on the display shows what memory size method is in use. Also the RsvdMem field shows how much memory is reserved and is not being tested. 7) Error Information ====================== Memtest has two options for reporting errors. The default is to report individual errors. In BadRAM Patterns mode patterns are created for use with the Linux BadRAM feature. This slick feature allows Linux to avoid bad memory pages. Details about the BadRAM feature can be found at: http://home.zonnet.nl/vanrein/badram For individual errors the following information is displayed when a memory error is detected. An error message is only displayed for errors with a different address or failing bit pattern. All displayed values are in hexadecimal. Tst: Test number Failing Address : Failing memory address Good: Expected data pattern Bad: Failing data pattern Err-Bits: Exclusive or of good and bad data (this shows the position of the failing bit(s)) Count: Number of consecutive errors with the same address and failing bits In BadRAM Patterns mode, Lines are printed in a form badram=F1,M1,F2,M2. In each F/M pair, the F represents a fault address, and the corresponding M is a bitmask for that address. These patterns state that faults have occurred in addresses that equal F on all "1" bits in M. Such a pattern may capture more errors that actually exist, but at least all the errors are captured. These patterns have been designed to capture regular patterns of errors caused by the hardware structure in a terse syntax. The BadRAM patterns are `grown' increment-ally rather than `designed' from an overview of all errors. The number of pairs is constrained to five for a number of practical reasons. As a result, handcrafting patterns from the output in address printing mode may, in exceptional cases, yield better results. 8) Trouble-shooting Memory Errors ================================ Please be aware that not all errors reported by Memtest86 are due to bad memory. The test implicitly tests the CPU, L1 and L2 caches as well as the motherboard. It is impossible for the test to determine what causes the failure to occur. Most failures will be due to a problem with memory. When it is not, the only option is to replace parts until the failure is corrected. Once a memory error has been detected, determining the failing SIMM/DIMM module is not a clear cut procedure. With the large number of motherboard vendors and possible combinations of simm slots it would be difficult if not impossible to assemble complete information about how a particular error would map to a failing memory module. However, there are steps that may be taken to determine the failing module. Here are three techniques that you may wish to use: 1) Removing modules This is simplest method for isolating a failing modules, but may only be employed when one or more modules can be removed from the system. By selectively removing modules from the system and then running the test you will be able to find the bad module(s). Be sure to note exactly which modules are in the system when the test passes and when the test fails. 2) Rotating modules When none of the modules can be removed then you may wish to rotate modules to find the failing one. This technique can only be used if there are three or more modules in the system. Change the location of two modules at a time. For example put the module from slot 1 into slot 2 and put the module from slot 2 in slot 1. Run the test and if either the failing bit or address changes then you know that the failing module is one of the ones just moved. By using several combinations of module movement you should be able to determine which module is failing. 3) Replacing modules If you are unable to use either of the previous techniques then you are left to selective replacement of modules to find the fai...
mytrik