AIM Homepage | Articles Homepage | List of Authors | Simon's Articles |
Some Amigas are very stable, others crash almost as often as Macs and PCs. The difference can be analysed and changed, with good tools and knowledge of the right approach.
Little has changed, except these days you're less likely to receive any message. Graphical User Interfaces try to avoid saying anything, resorting to Gurus - notices that something awful has affected one or all the things you are doing - and, at worst, spontaneous lock-ups and resets.
Much of my advice deals with software patches and settings, but they're no good if your hardware is liable to spontaneously self-destruct. The key word is spontaneous. Hardware problems happen regardless of what programs you are running.
If you suspect a hardware fault, create a "clean" Workbench partition from original floppies and boot from that. If that does not crash when using just the standard system files, your problem is likely to be caused, and cured, by software.
But if your machine is flaky whatever you run, this is the section for you. The programs in use shouldn't matter, although if a drive fault is suspected it's worth setting up a big copy operation. Chugging away at drives also tests the power supply more effectively than silent running.
The four classic hardware problems are overheating, insufficient power, loose connectors and pushing too hard. It's best to start by checking the first two because they're the most obvious.
The main Amiga power cables supply +5 volts for digital logic, and +12V for older drives and interfaces. CDs and larger hard drives usually still need +12V but modern floppies and small hard drives manage on +5 volts alone.
If either of these voltages drop by ten per cent or more, the system is likely to fail. PowerPCs and graphics cards are particularly sensitive to voltage, and may need direct connections to ensure power gets through from a beefed-up supply.
You can check the main power rails with a voltmeter at the floppy connector, but the voltage may sag more at crucial places in the circuit, like daughterboards. It's dangerous to poke around with the power on, so don't try this unless you know which pins you need to test and can access them easily.
Serial and audio ports use minus 12V as well as the plus 12V supply to derive symmetrical signals. If sound is recognisable but grossly distorted, perhaps with serial port problems, but otherwise a machine works, check the 7905 regulator in the -12V supply on a big box Amiga, or the middle pin on domestic Amigas - rated at a tenth of an amp - for -12V.
If your big-box computer crashes at bootup, the initial load might be swamping your power supply. This can even affect tower systems; one work-around is to delay spinning up drives with software or jumper settings. Big drives can apply a delay based on their SCSI ID, or can be told not to spin up until accessed - this makes initialisation slower but safer.
Diagnosing overheating involves two symptoms - apparent heat somewhere, or even a hot smell, suggesting damage - and a system that gets increasingly crash-prone the longer you run it. Such problems are noticeably worse in warm weather.
Ventilation cures overheating, so consider raising your fanless computer off the carpet, or moving cables or other components to prevent congestion in a big Amiga. Ensure airflow around the power supply case as well as the computer. You need not drill holes, Sinclair-style, if the ones Commodore provided are unblocked, but might find it prudent to leave desktop system trapdoors open.
In extremis you can run any Amiga without its case. This minimises overheating risks but could be fatal if something conductive drops into a crucial place. This is de rigeur for Simon and Richard, but a dangerous pose unless you enjoy fiddling with hardware. Nude computers interfere with AM radio systems nearby, signalling their activity to gurus but annoying others.
Fans are noisy and collect dirt. They're essential in big systems but best avoided if you can make convection - the tendency of warm air to rise - create the flow for you. Heatsinks, metal blocks that carry heat away from parts that use power, extend the life of on any chip that gets hot to the touch, as long as there's airflow around them. The metal case of a big Amiga is an important heatsink in its own right. Drives and boards in cramped places benefit from heatsinks, but they are of marginal benefit compared with airflow.
The Amiga trapdoor ports and Zorro 3 processor socket carry the most critical signals. These big connectors are less than positive to trim costs, and you're lucky to find a board so well-engineered that it works first time after replacement.
Make sure it's plugged in straight and all the way. Try again if you get a solid coloured screen when you power up, indicating a synchronisation problem, caused by a loose connector, total incompatibility or a blown motherboard, in order of probability and preference.
Disable reselection in HDToolbox to cure some SCSI problems
IDE master and slave combinations are not standardised, requiring specific jumper information for all your drives. Type the part number into a Web search engine to locate drive specifications. Test drives individually if a combination fails. Limit the MaxTransfer size to 0xFE00 unless you know your IDE drive can handle more.
Check the Net for details of drive setup and spin delay jumpers
The external plugs on the Amiga are liable to mechanical and consequent electrical faults. You can lock up or reset the machine from almost any of them! Serial hardware faults suggest cable or -12V supply problems, or too high a baud rate. If you fail to plug Zorro cards in properly the intelligent bus controller usually ignores them completely. Overheating boards are present from cold but disappear after a reset.
If a machine is really flaky, unplug everything you can and test a bare system. If this crashes, you need a new motherboard, or chip swaps on A3000 or earlier systems. Replace the add-ons, checking as you go. This is tedious as you have to power down between each change and the next.
If you overclock your processor, pile on gadgets regardless of power limits, use the top scan and baud rates, squeeze extra tracks and sectors out of your drives, run cables as far as possible in tight spaces, and insist on testing "Beta" software, you will have problems. You may enjoy fixing them, and the benefit might outweigh the cost, but your system's stability will suffer.
Every time you try something new, you run the risk of losing something you already have. Most stable computers are set up and left alone, with nothing more than backups and a little file housekeeping to disturb the applications. I dedicate one machine for tests and another to serious work.
Whenever you add or remove a card, utility or DOSdriver, keep a mental note of what you've done. That knowledge will enable you to restore a stable system after a failure, when millions of other tweaks will make no significant difference.
MUI_Showboards identifies all Zorro cards working in your system
Nothing lasts forever. Mechanical components like mice, keyboards and disk drives deteriorate steadily over the years, but most survive till something traumatic happens to them. As soon as they become flakey, get another - it will only get worse, and could stop working completely at any time. There's scant difference between 0 and 100 per cent in digital systems, and anything that moves or gets hot eventually succumbs to mechanical failure.
Reduce baud rates and use RTS/CTS handshaking to avoid serial overruns
Division by zero is a sure sign of sloppy programming
Many problems stem from interactions rather than a single cause, so make sure that the components of your software are compatible. Run Workbench files made for your Kickstart, rather than a hodge-podge from other versions. Match libraries; weird problems are likely if you mix RTG components or versions of IXEMUL and IXNET. To check the version of a file, type VERSION (file) FULL, (where "(file)" is the library or device you want to test).
SetPatch is the official "fix file", invented by Commodore and updated by Amiga Technologies and Amiga Inc. SetPatch collects major system bug-fixes in one program which runs at the start of any reliable startup-sequence. Recent versions suit all Amigas from Kickstart 2.0 onwards, installing just what your system needs. Type SetPatch in a shell to see the version, and what it fixed.
Setpatch resolves known problems on this AGA 060 Kickstart 3.1 Amiga
If your computer stops dead, try pressing the Caps Lock key a dozen or more times. If the light gets stuck, on or off, communication between keyboard and processor has failed, and you must reboot. If the crash was during startup, hold both mouse buttons through reset and select "boot with no startup sequence". Rename your WBStartup and User-startup drawers temporarily, and try to boot without the hacks and extensions therein. Reintroduce these cautiously, till you find the one causing the problem.
Extra ECHO and WAIT lines in User-startup track progress when a system crashes when booting. WBStartup+ selectively disables startup commodities and determines the order in which extensions are loaded. Some interact badly at first, so priority juggling can persuade an otherwise incompatible collection to co-operate.
To track and eliminate deadlocks, move mount files from your Devs to Storage drawer, then mount them individually by clicking on the WB3 icons, or issuing MOUNT commands on old Amigas, to work out which one is getting stuck. This may indicate an interface, drive or cable problem.
WAITVAL SYS: at the beginning of your startup-sequence prevents a host of error messages if the computer needs to revalidate the system partition after a reset. This is likely if a prior crash occurred while it was updating system files.
The original Workbench 1 instability report, for Gurus only
Kickstart 1's "Task Held" message is a sign of a crash in the offing
Kickstarts 2 and above give the option to suspend tasks that would otherwise crash
Programs that stop with code "87000004" were meant to run from a Shell, and fail because they lack Workbench startup code. Start them from a direct CLI command or an IconX script, and these crashes will disappear. "8000000B" indicates a coprocessor exception, common if you try to use programs compiled for another type of FPU or MMU.
Last Alert tells you the cause of a crash when you next reboot
The prevalent Guru codes detected by the processor start with 8000000 followed by 2, 3, 4 or A. These mean that the processor has encountered a daft instruction, usually because it's jumped out of the real program or something has overwritten that code. Such corruption is the prime cause of crashes, but tools can detect and prevent it.
This crypytic Guru means software tries to access non-existent hardware
Thor's Guardian Angel reveals a hole in AMOS memory allocation
Programs and data are interchangeable in memory. This is a great strength of the Turing/Von Neuman computer architecture, but also the root of most bugs. If a program puts data in the wrong place anything could happen later, and it may be hard to connect the perpetrator with the result.
Illegal instructions are usually meant to be data, or corrupted code
Unix and laterly Windows and Mac systems use hardware to detect memory addressing errors, but mainly to implement "virtual memory", swapping programs and data to and from temporary disk space. This is always dodgy on Amigas. GigaMem and VMM are certain to get knotted if they search a system list that has been swapped out during an AmigaOS "critical region". If you run out of space, use an application with its own VM routines, rather than a system-wide afterthought, or preferably get more real memory.
MuForce shows how often programs "fall through" to access address zero
These "enforcers" trap and report attempts to access memory which is not owned by the task. They consume negligible resources unless your software does risky things, and are an excellent way to sort wheat from chaff. Every "hit" generates a pile of numbers, recording the local context of the exception for programmers; the task name and operation trapped give most away.
Poorly-tested C programs often try to access structures without setting a base address, so they end up fiddling around in low memory. Enforcers block writes to this sensitive area, and return a relatively safe zero on reads. Address zero on an Amiga system normally holds the value zero, and many programs rely on that to stop themselves falling off the end of null-terminated lists!
Purify, MemSniff and many others memory management tools on AFCDs work without hardware assistance. Mungwall is the classic of this genre, and works best in conjunction with an enforcer.
Mungwall also allocates and marks extra space at each end of an allocation, so it can detect common problems where programmers narrowly miss the intended space. If you find such problems, save the debugger output and send it to the programmers. Distrust such applications, especially if they *write* values willy-nilly.
Tools like MemWatch and MemMeter highlight programs that "creep", allocating memory that they never release. This is another common Amiga programming error, for want of the "resource tracking" in Unix, Qdos and QNX. Unchecked creep eventually causes crashes.
Memory may run out because it is fragmented - split into too many sections
The amount of stack space a task gets depends on how it is started. Workbench icon info includes a "stack" parameter. Every task needs space for its registers while another process is using the processor, and many programs are happy with just a few kilobytes of stack, but languages like C and Pascal demand more, to cope with recursion and variable allocation inside their blocks.
Compiled programs crash if they run out of stack space
We've collected programs to monitor and manage stacks on AFCD47. Tools can dynamically report the amount of stack space a task is using, making it obvious when an overflow has occurred and a crash is impending. You can add stack space to a running program but it's safer to quit and start again with more.
Programs like CentreQuest, NewEdit and Amiga E tasks relocate their stacks without telling the system what they've done, so snoopers show a fixed, negative space. The StackSnoop drawer includes Thor's fix for a bug in the AmigaOS console device.
To discourage stack overflows, add the line: STACK 65536 at the start of your S:Shell-startup file. This allocates 60K more stack space for every command - that's often wasted, but preferable to a crash if you have memory to spare.
Hacks may introduce new bugs, for instance the original WritePixel8 chunky graphics routine is slow and corrupts its input; patches are faster but go awry if two programs try to use them at once! Angela Schmidt's Kiskometer monitors system patches, warning of programs that compete, patching the same function for different purposes.
Kiskometer monitors exactly who is changing what in your system
StackWatch is one of a stack of tools to monitor space (on AFCD47)
If a program pauses and sometimes crashes when started, it may be badly packed. Try invoking it with your processor cache disabled, using the CPU NOCACHE shell command. You should be able to restore full speed with CPU CACHE, after unpacking. Run the file through a late version of Imploder or PowerPacker, extract the original and re-pack it safely. For optimal stability, avoid packers; they introduce avoidable risks and fragment memory.
Software written for old 68000s may trigger privilege violations on new chips
FixGetMsg stops 68060s toggling interrupts faster than they can get a message through to the system. NoBypass is an AF-exclusive cure for a race condition when the 68060 tries to run two instructions simultaneously. It's far less costly than disabling Superscalar execution, the previous "fix" for this problem.