AIM Homepage Articles Homepage List of Authors Simon's Articles

STABILITY

Reliability is the bedrock of Simon Goodwin's guide to building and maintaining stable Amiga systems

After many articles on souping-up your Amiga, we owe you a feature on making it more stable. The theme is avoiding crashes, not optimising performance. As you upgrade, programs or entire systems may stop working. I shall explain why, and what you can do about it.

Some Amigas are very stable, others crash almost as often as Macs and PCs. The difference can be analysed and changed, with good tools and knowledge of the right approach.

Classification

Before you can fix a fault you must understand it. In Datalink almost 20 years ago I observed that there are basically two error messages, "Bad Device" and "Probable User Error". Tandy opted for three, "What?", "How?" and "Sorry?" Unix is generally considered terse for retaining only the last character of those three, the question mark.

Little has changed, except these days you're less likely to receive any message. Graphical User Interfaces try to avoid saying anything, resorting to Gurus - notices that something awful has affected one or all the things you are doing - and, at worst, spontaneous lock-ups and resets.

Much of my advice deals with software patches and settings, but they're no good if your hardware is liable to spontaneously self-destruct. The key word is spontaneous. Hardware problems happen regardless of what programs you are running.

Hardware Stability

If your machine crashes just a Shell or quiet Workbench, you've got "hardware" problems. The Amiga OS is well-tested, reliable firmware, assuming you or some malignant program haven't messed with the system files. Without a consistently write-protected set of system disks, you can only guess. Viruses are rarer than cockups, but both are real threats.

If you suspect a hardware fault, create a "clean" Workbench partition from original floppies and boot from that. If that does not crash when using just the standard system files, your problem is likely to be caused, and cured, by software.

But if your machine is flaky whatever you run, this is the section for you. The programs in use shouldn't matter, although if a drive fault is suspected it's worth setting up a big copy operation. Chugging away at drives also tests the power supply more effectively than silent running.

The four classic hardware problems are overheating, insufficient power, loose connectors and pushing too hard. It's best to start by checking the first two because they're the most obvious.

Balance of Power

A standard A1200 power supply accommodates one extra drive and trapdoor memory. Anything significant extra load on the 12 volt rail means you'll need a stronger supply. The A500 one is more generous, adequate for a 3.5" hard drive and 32 bit 68K accelerator.

The main Amiga power cables supply +5 volts for digital logic, and +12V for older drives and interfaces. CDs and larger hard drives usually still need +12V but modern floppies and small hard drives manage on +5 volts alone.

If either of these voltages drop by ten per cent or more, the system is likely to fail. PowerPCs and graphics cards are particularly sensitive to voltage, and may need direct connections to ensure power gets through from a beefed-up supply.

You can check the main power rails with a voltmeter at the floppy connector, but the voltage may sag more at crucial places in the circuit, like daughterboards. It's dangerous to poke around with the power on, so don't try this unless you know which pins you need to test and can access them easily.

Serial and audio ports use minus 12V as well as the plus 12V supply to derive symmetrical signals. If sound is recognisable but grossly distorted, perhaps with serial port problems, but otherwise a machine works, check the 7905 regulator in the -12V supply on a big box Amiga, or the middle pin on domestic Amigas - rated at a tenth of an amp - for -12V.

If your big-box computer crashes at bootup, the initial load might be swamping your power supply. This can even affect tower systems; one work-around is to delay spinning up drives with software or jumper settings. Big drives can apply a delay based on their SCSI ID, or can be told not to spin up until accessed - this makes initialisation slower but safer.

Thermal Overload

Overheating should be obvious, but it's amazing how hot components made for speed can safely get; certainly hotter than you'd like to touch, even momentarily, with bare skin. Chips can run at temperatures up to 100 Centigrade, though their lifetime is reduced. I've never heard of an Earthbound computer that runs /too cool/.

Diagnosing overheating involves two symptoms - apparent heat somewhere, or even a hot smell, suggesting damage - and a system that gets increasingly crash-prone the longer you run it. Such problems are noticeably worse in warm weather.

Ventilation cures overheating, so consider raising your fanless computer off the carpet, or moving cables or other components to prevent congestion in a big Amiga. Ensure airflow around the power supply case as well as the computer. You need not drill holes, Sinclair-style, if the ones Commodore provided are unblocked, but might find it prudent to leave desktop system trapdoors open.

In extremis you can run any Amiga without its case. This minimises overheating risks but could be fatal if something conductive drops into a crucial place. This is de rigeur for Simon and Richard, but a dangerous pose unless you enjoy fiddling with hardware. Nude computers interfere with AM radio systems nearby, signalling their activity to gurus but annoying others.

Playing it Cool

The standard way to cool a computer is to waft air around with a fan, but if the wind does not pass the thing that's getting hot, the effort is wasted. Whether they blow or suck Amiga fans fill the power supply with dust - best left alone, in that kilovolt environment - and whistle through the floppy ports.

Fans are noisy and collect dirt. They're essential in big systems but best avoided if you can make convection - the tendency of warm air to rise - create the flow for you. Heatsinks, metal blocks that carry heat away from parts that use power, extend the life of on any chip that gets hot to the touch, as long as there's airflow around them. The metal case of a big Amiga is an important heatsink in its own right. Drives and boards in cramped places benefit from heatsinks, but they are of marginal benefit compared with airflow.

Tight sockets

Loose connectors are the commonest cause of intermittent problems, so major A600, A1200 and A4000 parts are soldered directly to the motherboard, rather than socketed as on earlier Amigas. It is cheaper to swap parts on an A500 or A3000, and a good way to diagnose blown components, but many faults on old machines may be relieved by cleaning chip legs and sockets with isopropanol, then plugging them back in properly.

The Amiga trapdoor ports and Zorro 3 processor socket carry the most critical signals. These big connectors are less than positive to trim costs, and you're lucky to find a board so well-engineered that it works first time after replacement.

Make sure it's plugged in straight and all the way. Try again if you get a solid coloured screen when you power up, indicating a synchronisation problem, caused by a loose connector, total incompatibility or a blown motherboard, in order of probability and preference.

Drive connections

SCSI faults are rare if all lines are actively terminated at both ends of the chain. Sub-standard drives are unreliable unless you disable HDToolbox "reselection". If any drive does not appear, check that each unit on a cable has an unique ID number.

Reselection

Disable reselection in HDToolbox to cure some SCSI problems

IDE master and slave combinations are not standardised, requiring specific jumper information for all your drives. Type the part number into a Web search engine to locate drive specifications. Test drives individually if a combination fails. Limit the MaxTransfer size to 0xFE00 unless you know your IDE drive can handle more.

DriveInfo

Check the Net for details of drive setup and spin delay jumpers

The external plugs on the Amiga are liable to mechanical and consequent electrical faults. You can lock up or reset the machine from almost any of them! Serial hardware faults suggest cable or -12V supply problems, or too high a baud rate. If you fail to plug Zorro cards in properly the intelligent bus controller usually ignores them completely. Overheating boards are present from cold but disappear after a reset.

If a machine is really flaky, unplug everything you can and test a bare system. If this crashes, you need a new motherboard, or chip swaps on A3000 or earlier systems. Replace the add-ons, checking as you go. This is tedious as you have to power down between each change and the next.

Pushing too Hard

This topic could refer to determinedly inserting 23 way plugs in 25 way sockets, or IDC connectors upside down, but I'm more concerned about a "live fast, die young" mentality. If you run everything in your system at its limit, you court trouble. This fashion-victim status is the enemy of stability, aptly described as "living on the (b)leeding edge". The Amiga architecture is so open-ended that it's important to comparing risks and benefits, unless you enjoy farming for its own sake.

If you overclock your processor, pile on gadgets regardless of power limits, use the top scan and baud rates, squeeze extra tracks and sectors out of your drives, run cables as far as possible in tight spaces, and insist on testing "Beta" software, you will have problems. You may enjoy fixing them, and the benefit might outweigh the cost, but your system's stability will suffer.

Every time you try something new, you run the risk of losing something you already have. Most stable computers are set up and left alone, with nothing more than backups and a little file housekeeping to disturb the applications. I dedicate one machine for tests and another to serious work.

Whenever you add or remove a card, utility or DOSdriver, keep a mental note of what you've done. That knowledge will enable you to restore a stable system after a failure, when millions of other tweaks will make no significant difference.

ShowBoards

MUI_Showboards identifies all Zorro cards working in your system

Nothing lasts forever. Mechanical components like mice, keyboards and disk drives deteriorate steadily over the years, but most survive till something traumatic happens to them. As soon as they become flakey, get another - it will only get worse, and could stop working completely at any time. There's scant difference between 0 and 100 per cent in digital systems, and anything that moves or gets hot eventually succumbs to mechanical failure.

Software Stability

The most erratic systems have hardware faults, but software causes most instability. The more programs you run, the more bugs you'll find. It's often more important to know about bugs than to fix them. About half of all fixes introduce a new problem that you're not told about - and which may be worse, when you get around to comparing it.

ateoprefs

Reduce baud rates and use RTS/CTS handshaking to avoid serial overruns

DALastAlert

Division by zero is a sure sign of sloppy programming

Many problems stem from interactions rather than a single cause, so make sure that the components of your software are compatible. Run Workbench files made for your Kickstart, rather than a hodge-podge from other versions. Match libraries; weird problems are likely if you mix RTG components or versions of IXEMUL and IXNET. To check the version of a file, type VERSION (file) FULL, (where "(file)" is the library or device you want to test).

SetPatch is the official "fix file", invented by Commodore and updated by Amiga Technologies and Amiga Inc. SetPatch collects major system bug-fixes in one program which runs at the start of any reliable startup-sequence. Recent versions suit all Amigas from Kickstart 2.0 onwards, installing just what your system needs. Type SetPatch in a shell to see the version, and what it fixed.

Setpatch

Setpatch resolves known problems on this AGA 060 Kickstart 3.1 Amiga

Locks and Crashes

Unstable Amigas may lock up, ignoring all input, or spontaneously reset. It's not easy to reset an Amiga in software; resets are normally caused by system bugs, so suspect hardware, processor libraries and patches. Your hardware might not be in a consistent state after such involuntary resets, so it's wise to power down and reboot from scratch to ensure everything starts from a clean slate.

If your computer stops dead, try pressing the Caps Lock key a dozen or more times. If the light gets stuck, on or off, communication between keyboard and processor has failed, and you must reboot. If the crash was during startup, hold both mouse buttons through reset and select "boot with no startup sequence". Rename your WBStartup and User-startup drawers temporarily, and try to boot without the hacks and extensions therein. Reintroduce these cautiously, till you find the one causing the problem.

Extra ECHO and WAIT lines in User-startup track progress when a system crashes when booting. WBStartup+ selectively disables startup commodities and determines the order in which extensions are loaded. Some interact badly at first, so priority juggling can persuade an otherwise incompatible collection to co-operate.

Deadlocks

If you can still move the mouse pointer, but windows are not being updated, a time-critical task is sapping your CPU power. The culprit is probably a device driver waiting for a message that will never come. Drivers and handlers run at higher priority than applications, and are meant to back off once they've done their urgent work. Deadlocks result if they keep running, so custom task priorities threaten stability.

To track and eliminate deadlocks, move mount files from your Devs to Storage drawer, then mount them individually by clicking on the WB3 icons, or issuing MOUNT commands on old Amigas, to work out which one is getting stuck. This may indicate an interface, drive or cable problem.

WAITVAL SYS: at the beginning of your startup-sequence prevents a host of error messages if the computer needs to revalidate the system partition after a reset. This is likely if a prior crash occurred while it was updating system files.

Guru reports

Motorola processors detect nonsensical instructions and trigger an "exception" which produces a "Guru" or "Software Failure" alert, stopping the offending task. If this is an application, others may continue, but it's safer to tidy up and reset, because bad code might have corrupted other programs. Guru tools and alert patchers give extra information, as discussed in my "Under The Bonnet" series last year.

OldGuru

The original Workbench 1 instability report, for Gurus only

TaskHeld

Kickstart 1's "Task Held" message is a sign of a crash in the offing

softfailure

Kickstarts 2 and above give the option to suspend tasks that would otherwise crash

Programs that stop with code "87000004" were meant to run from a Shell, and fail because they lack Workbench startup code. Start them from a direct CLI command or an IconX script, and these crashes will disappear. "8000000B" indicates a coprocessor exception, common if you try to use programs compiled for another type of FPU or MMU.

Last Alert

Last Alert tells you the cause of a crash when you next reboot

The prevalent Guru codes detected by the processor start with 8000000 followed by 2, 3, 4 or A. These mean that the processor has encountered a daft instruction, usually because it's jumped out of the real program or something has overwritten that code. Such corruption is the prime cause of crashes, but tools can detect and prevent it.

AddressGuru

This crypytic Guru means software tries to access non-existent hardware

Memory Protection

Thomas Richter Thomas "Thor" Richter's brilliant Guardian Angel software uncovers loads of hidden bugs in sloppy programs and patches. This is the latest in a host of tools that can detect badly-behaved programs and mitigate their effects.

BadRelease

Thor's Guardian Angel reveals a hole in AMOS memory allocation

Programs and data are interchangeable in memory. This is a great strength of the Turing/Von Neuman computer architecture, but also the root of most bugs. If a program puts data in the wrong place anything could happen later, and it may be hard to connect the perpetrator with the result.

IllegalInstruction

Illegal instructions are usually meant to be data, or corrupted code

Unix and laterly Windows and Mac systems use hardware to detect memory addressing errors, but mainly to implement "virtual memory", swapping programs and data to and from temporary disk space. This is always dodgy on Amigas. GigaMem and VMM are certain to get knotted if they search a system list that has been swapped out during an AmigaOS "critical region". If you run out of space, use an application with its own VM routines, rather than a system-wide afterthought, or preferably get more real memory.

Enforcements

Amigas with Memory Management hardware can still be much more reliable than those without. Mike Sinz's "Enforcer" program trapped many program bugs that would otherwise trash memory, but it is outdated. Phase 5 ship their own version, CyberGuard. Thor's freely-distributable MMULib includes MuForce and Guardian Angel, which also monitors unallocated memory and checks that allocations and releases correspond.

DummyRead

MuForce shows how often programs "fall through" to access address zero

These "enforcers" trap and report attempts to access memory which is not owned by the task. They consume negligible resources unless your software does risky things, and are an excellent way to sort wheat from chaff. Every "hit" generates a pile of numbers, recording the local context of the exception for programmers; the task name and operation trapped give most away.

Poorly-tested C programs often try to access structures without setting a base address, so they end up fiddling around in low memory. Enforcers block writes to this sensitive area, and return a relatively safe zero on reads. Address zero on an Amiga system normally holds the value zero, and many programs rely on that to stop themselves falling off the end of null-terminated lists!

Nippon Raw Fish

Low-level debugging tools, from Kickstart's built-in WACK upwards, send results directly to Amiga serial port hardware. This makes sense if a bug has clobbered the whole system, but if you don't happen to have a 9600 baud serial terminal kicking around you must divert their reports to a file or window. The classic way to do this is with Sushi, by Commodore's Carolyn Scheppner. Sashimi is the latest flavour, well worth directing to a CON window early in your user-startup.

Purify, MemSniff and many others memory management tools on AFCDs work without hardware assistance. Mungwall is the classic of this genre, and works best in conjunction with an enforcer.

Mungwall

Mungwall puts characteristic hexadecimal patterns in places programs should not touch: values like $C0DEDBAD at zero, $ABADCAFE in unused space, $DEADFOOD in space reserved but uninitialised, and $DEADBEEF in memory that has been deallocated. Watch out for these giveaways in Enforcer reports.

Mungwall also allocates and marks extra space at each end of an allocation, so it can detect common problems where programmers narrowly miss the intended space. If you find such problems, save the debugger output and send it to the programmers. Distrust such applications, especially if they *write* values willy-nilly.

Tools like MemWatch and MemMeter highlight programs that "creep", allocating memory that they never release. This is another common Amiga programming error, for want of the "resource tracking" in Unix, Qdos and QNX. Unchecked creep eventually causes crashes.

Fragments

Memory may run out because it is fragmented - split into too many sections

Stack Checks

Every AmigaOS task has a "stack" memory area, reserved for temporary results. If a program tries to put too much on its stack, adjacent memory gets corrupted.

The amount of stack space a task gets depends on how it is started. Workbench icon info includes a "stack" parameter. Every task needs space for its registers while another process is using the processor, and many programs are happy with just a few kilobytes of stack, but languages like C and Pascal demand more, to cope with recursion and variable allocation inside their blocks.

StackSize

Compiled programs crash if they run out of stack space

We've collected programs to monitor and manage stacks on AFCD47. Tools can dynamically report the amount of stack space a task is using, making it obvious when an overflow has occurred and a crash is impending. You can add stack space to a running program but it's safer to quit and start again with more.

Programs like CentreQuest, NewEdit and Amiga E tasks relocate their stacks without telling the system what they've done, so snoopers show a fixed, negative space. The StackSnoop drawer includes Thor's fix for a bug in the AmigaOS console device.

To discourage stack overflows, add the line: STACK 65536 at the start of your S:Shell-startup file. This allocates 60K more stack space for every command - that's often wasted, but preferable to a crash if you have memory to spare.

Known problem programs

Risking howls of protest, this box categorises programs known to cause stability problems. I'm not saying you should not use these - the list includes some irreplaceable, even unavoidable programs - but you should be wary about them.

MCP, hacks and patches

Aminet and compilations like MCP abound with patches that modify system routines to fix bugs or add functions. This reconfigurability is both a strength and weakness of the Amiga. Some are innocuous, others dangerous, and layered patches often yield unexpected and unwanted results.

Hacks may introduce new bugs, for instance the original WritePixel8 chunky graphics routine is slow and corrupts its input; patches are faster but go awry if two programs try to use them at once! Angela Schmidt's Kiskometer monitors system patches, warning of programs that compete, patching the same function for different purposes.

Kiskometer

Kiskometer monitors exactly who is changing what in your system

Ixemul and Unix ports

Unix programs may be quickly converted by linking them with IXemul, a Unix emulation library. But Unix systems expand task stacks automatically, whereas AmigaOS requires you to set a safe maximum. SnoopDOS detects programs that call IXemul and StackWatch indicates whether they're staying within safe bounds.

StackWatch

StackWatch is one of a stack of tools to monitor space (on AFCD47)

Magic User Interface

MUI makes heavy demands on graphics memory, and can crash Amigas when that runs out. To avoid this, limit screen sizes and colour depth, share screens between applications, or buy a graphics card. MUI's mass of options and plug-ins makes testing particularly difficult. Be wary about "updating" MUI custom classes and configuration tweaks. You might find a ClassAct or GadTools program that does the same job more safely and economically, if less prettily.

Naive BASICs

BASIC is a great program language for beginners, but inspires dangerously naive coding, among compiler and interpreter writers as well as users. AMOS and Blitz BASIC run-time systems have bugs which risk crashes, especially on expanded systems. Not all programs are affected, but it's wise to run an enforcer to detect those that are.

The Crunch

Floppy disks made it fashionable to compress programs to reduce the size of the executable file; PowerPacker and Imploder were useful ways to squeeze a quart into pint pot. However these blur the vital distinction between data and program, and were often written without proper regard for CPU caches.

If a program pauses and sometimes crashes when started, it may be badly packed. Try invoking it with your processor cache disabled, using the CPU NOCACHE shell command. You should be able to restore full speed with CPU CACHE, after unpacking. Run the file through a late version of Imploder or PowerPacker, extract the original and re-pack it safely. For optimal stability, avoid packers; they introduce avoidable risks and fragment memory.

Kickstart ROM

There are few serious bugs in Kickstart, but screen-swaps between modes at different scan rates can cause lockups. Kickstart 3.1 is a lot safer than earlier versions, but still not perfect, so be cautious when mixing modes. The general problem with AmigaOS is that it is lean and mean. It doesn't waste much time checking its parameters, so if programs pass it nonsense, weird things result. Richard Körber's "PatchWork" guards against these errors.

68060s and PPCs

Commodore never tested AmigaOS on any processor after the 68040, so you're exploring relatively uncharted territory with later chips. The latest fixes are on the CD. PowerUp, WarpUp, OXYpatcher and 68060.library try to make these fully compatible, but all bring problems as well as cures. PPC programs often interact badly - if you try to use more than one PPC application at a time, you're living dangerously.

GUI-Guru

Software written for old 68000s may trigger privilege violations on new chips

FixGetMsg stops 68060s toggling interrupts faster than they can get a message through to the system. NoBypass is an AF-exclusive cure for a race condition when the 68060 tries to run two instructions simultaneously. It's far less costly than disabling Superscalar execution, the previous "fix" for this problem.

Back to the Top