Power7 High-End System Firmware

Applies to: 9125-F2C

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power 775 (9125-F2C) Servers only.

The firmware level in this package is:


1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code level for this firmware is:  HMC V7 R7.3.0 (PTF MH01255 or MH01256) with PTF MH01257 (Mandatory efix).

Although the Minimum HMC Code level for this firmware is listed above,  HMC V7 R7.3.0 Service Pack 7  (PTF MH01456) with security fixes (PTFs MH01500 and MH01503), or higher is recommended.

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central.
http://www-933.ibm.com/support/fixcentral/

For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home

NOTE: You must be logged in as hscroot in order for the firmware installation to complete correctly.

2.0 Important Information

Additional Details About Installing This Service Pack

The new level of optical link firmware is installed automatically during a node boot after this service pack is installed; it is done prior to the optical init executing.  This happens in parallel with the hypervisor starting, and it prevents usage of the hub HFIs.  After the update is complete, and optical init is complete, the optical interconnects will be fully functional.  Allow for an additional 1 to 1.25 hours of boot time per node on the next reboot after installing this service pack for this operation.

This new optical module firmware fixes several issues, among them:

IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
  http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hat/iphatlparmemory.htm

Downgrading firmware from any given release level to an earlier release level is not recommended.
If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.


3.0 Firmware Information and Description

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as deferred. These deferred fixes can be installed concurrently, but will not be activated until the next IPL. Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For deferred fixes within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01ASXXX_YYY_ZZZ

NOTE: Values of service pack and last disruptive service pack level (YYY and ZZZ) are only unique within a release level (XXX). For example, 01AS330_067_045 and 01AS340_067_053 are different service packs.

An installation is disruptive if:

Example: Currently installed release is AS330, new release is AS340 Example: AS330_120_120 is disruptive, no matter what level of AS330 is currently
installed on the system Example: Currently installed service pack is AS330_120_120 and
new service pack is AS330_152_130

An installation is concurrent if:

Example: Currently installed service pack is AS330_126_120,
new service pack is AS330_143_120.

 
Filename Size Checksum
01AS730_140_093.rpm 37833023 45891
   
Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01AS730_140_093.rpm

AS730
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/AS-Firmware-Hist.html
AS730_140_093
/ FW731.71

08/21/14

Impact: Security         Severity:  HIPER

System firmware changes that affect all systems

  • HIPER/Pervasive:  A security problem was fixed in the OpenSSL (Secure Socket Layer) protocol that allowed clients and servers, via a specially crafted handshake packet, to use weak keying material for communication.  A man-in-the-middle attacker could use this flaw to decrypt and modify traffic between the management console and the service processor.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0224.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL for a buffer overflow in the Datagram Transport Layer Security (DTLS) when handling invalid DTLS packet fragments.  This could be used to execute arbitrary code on the service processor.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0195.
  • HIPER/Pervasive:  Multiple security problems were fixed in the way that OpenSSL handled read and write buffers when the SSL_MODE_RELEASE_BUFFERS mode was enabled to prevent denial of service.  These could cause the service processor to reset or unexpectedly drop connections to the management console when processing certain SSL commands.  The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2010-5298 and CVE-2014-0198.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL to prevent a denial of service when handling certain Datagram Transport Layer Security (DTLS) ServerHello requests. A specially crafted DTLS handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0221.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL to prevent a denial of service by using an exploit of a null pointer de-reference during anonymous Elliptic Curve Diffie Hellman (ECDH) key exchange.  A specially crafted handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3470.
  • Help text for the Advanced System Management Interface (ASMI) "System Configuration/Hardware Deconfiguration/Clear All Deconfiguration Errors" menu option was enhanced to clarify that when selecting "Hardware Resources" value of "All hardware resources", the service processor deconfiguration data is not cleared.
    The "Service processor" must be explicitly selected for that to be cleared.
  • A problem was fixed that prevented guard error logs from being reported for FRUs that were guarded during the system power on.  This could happen if the same FRU had been previously reported as guarded on a different power on of the system.  The requirement is now met that guarded FRUs are logged on every power on of the system.

AS730_138_093
/ FW731.70

05/09/14

Impact: Availability    Severity: SPE

New Features and Functions

  • Support was dropped for Secured Socket Layer (SSL) Version 2 and SSL weak and medium cipher suites in the service processor web server (Lighttpd).  Unsupported web browser connections to the Advanced System Management Interface (ASMI) secured port 443 (using https://) will now be rejected if those browsers do not support SSL version 3.  Supported web browsers for Power7 ASMI are Netscape (version 9.0.0.4), Microsoft Internet Explorer (version 7.0), Mozilla Firefox (version 2.0.0.11), and Opera (version 9.24).

System firmware changes that affect all systems

  • A problem was fixed that prevented the service processor from recognizing the I/O hub Host Fabric Interface (HFI) and Collective Acceleration Unit (CAU) components as valid functional units (FUs).   This caused guard reports to show "Invalid FU" as the hardware type of the components along with an incorrect "DECONFIGURED" call out hardware state.
  • A problem was fixed that caused system memory to guarded when service processor errors on the FRU Support Interface (FSI)  occurred.
  • A problem was fixed that caused a flood of predictive error (PE) logs with SRC B181E550 for Integrated Switch Router (ISR) chip recoverable errors.  The errors are logged by the service processor PRD component with signature description "io(n0p0) Undefined error code" but there is no hardware guarded.
  • A problem was fixed that caused a service processor dump to be generated with SRC B18187DA "NETC_RECV_ER" logged.
  • A problem was fixed that caused a SRC B1754201 predictive error to be logged without call out actions.  Missing call outs were added for bus errors accessing the Torrent chip.
  • A problem was fixed that could block Host Fabric Interface (HFI) array error recovery and eventually lead to a double bit error, which would cause the HFI to become unusable until the next system reboot.
  • A problem was fixed that caused an error log generated by the partition firmware to show conflicting firmware levels.  This problem occurs after a firmware update or a logical partition migration (LPM) operation on the system.
  • A problem was fixed in the isolation of PCI faults for stopped clocks so that the error would not cause a system-wide failure.  The error is now limited to the affected logical partition (LPAR).
  • A problem was fixed that caused a L2 cache error to not guard out the faulty processor, allowing the system to checkstop again on an error to the same faulty processor.
  • A problem was fixed that caused a HMC code update failure for the FSP on the accept operation with SRC B1811402 or FSP is unable to boot on the updated side.
  • DEFERRED: A problem was fixed that caused a system checkstop during hypervisor time keeping services. This deferred fix addresses a problem that has a very low probability of occurrence.  As such customers may wait for the next planned service window to activate the deferred fix via a system reboot.
  • A problem was fixed that caused a lose of Time of Day (TOD) clock redundancy after a power repair of a Distributed Conversion and Control Assembly (DCCA).  After the DCCA repair, the primary and secondary TOD were assigned to the same oscillator in the DCCA that never lost power, even though both system oscillators were functional.
  • A problem was fixed that caused the system attention LED to be lit without a corresponding SRC and error log for the event.  This problem typically occurs when an operating system on a partition terminates abnormally.
  • DEFERRED: A problem was fixed that caused a system checkstop with SRC B113E504 for a recoverable hardware fault.  This deferred fix addresses a problem that has a very low probability of occurrence.  As such customers may wait for the next planned service window to activate the deferred fix via a system reboot.

System firmware changes that affect certain systems

  • On systems running AIX or Linux, a problem was fixed that caused a partition to fail to boot with SRC CA260203.  This problem also can cause concurrent firmware updates to fail.
  • On systems using IPv6 addresses, the firmware was enhanced to reduce the time it take to install an operating system using the Network Installation Manager (NIM).
  • On a partition with a large number of potentially bootable devices, a problem was fixed that caused the partition to fail to boot with a default catch, and SRC BA210000 may also be logged.
  • On systems in a high-performance computing (HPC) B-side cluster with an 8D_2S cross-coupled topology, a problem in the Local Network Management Controller (LNMC) was fixed that caused distance link (D-link) virtual channel (VC) deadlocks when using indirect routes.  Secondary routes had been erroneously included in the indirect route chain.  For this problem, the Executive Manager Server (EMS) will repeatedly log  "VC Deadlock Error" messages into the /var/opt/isnm/cnm/logs/EVT_SUM.log
  • A problem was fixed in the run-time abstraction services (RTAS) extended error handling (EEH) for fundamental reset that caused partitions to crash during adapter updates.  The fundamental reset of adapters now returns a valid return code.  The adapter drivers using fundamental reset affected by this fix are the following:
    o QLogic PCIe Fibre Channel adapters (combo card)
    o IBM PCIe Obsidian
    o Emulex BE3-based ethernet adapters
    o Broadcom-based PCIe2 4-port 1Gb ethernet
    o Broadcom-based FlexSystem EN2024 4-port 1Gb ethernet for compute nodes
  • On systems with a DIMM error,  a problem was fixed in the service processor memory diagnostic that caused the de-configuration of all memory.  The memory diagnostic had failed all the memory due to special attention flooding caused by the bad hardware that did not allow the memory diagnostic to complete.   With the special attention flooding prevented, the memory diagnostic is now able to isolate the DIMM error to a FRU location and guard it so the system is able to IPL.

AS730_130_093
/ FW731.61

10/25/13

Impact: Availability    Severity: SPE

System firmware changes that affect certain systems

  • On systems in a high-performance computing (HPC) B-side cluster with an 8D_2S cross-coupled topology, a problem in the Local Network Management Controller (LNMC) was fixed that caused distance link (D-link) virtual channel (VC) deadlocks when using indirect routes.  Secondary routes had been erroneously included in the indirect route chain.  For this problem, the Executive Manager Server (EMS) will repeatedly log  "VC Deadlock Error" messages into the /var/opt/isnm/cnm/logs/EVT_SUM.log
AS730_125_093

03/11/13

Impact: Availability    Severity: SPE

System firmware changes that affect all systems

  • A problem was fixed that caused SRC B1813221, which indicates a failure of the battery on the service processor, to be erroneously logged after a service processor reset or power cycle.
  • A problem was fixed that caused various SRCs to be erroneously logged at boot time including B181E6C7 and B1818A14.
  • A problem was fixed that caused a system to abnormally terminate due to a null pointer reference. 
  • The firmware was enhanced to reduce "sender hang" errors and failures to boot nodes via the cluster fabric.
System firmware changes that affect certain systems
  • On large clusters, a problem was fixed that caused some links in the system to remain permanently in the DOWN_RECV_GOOD state.  The links in question will not be fully utilized for data transmission.  The problem occurs with regular frequency on large clusters when re-IPLing all CECs in the system.
AS730_118_093

11/02/12

Impact: Function    Severity: SPE

System firmware changes that affect all systems

  • DEFERRED:  A problem was fixed that could cause a live lock on the power bus resulting in a system crash.
  • The firmware was enhanced to increase the performance of certain applications by updating the routing tables.
  • A problem was fixed that caused a segmentation fault in the service processor firmware.  When this occurred, a PERC error with SRC B181C350 was logged.
  • On systems on which Internet Explorer (IE) is used to access the Advanced System Management Interface (ASMI) on the Hardware Management Console (HMC), a problem was fixed that caused IE to hang for about 10 minutes after saving changes to network parameters on the ASMI.
  • A problem was fixed that caused the gateway network address  to be shown incorrectly on the System Management Services (SMS) menus when booting a partition on an iSCSI network.
  • A problem was fixed that caused a "code accept" during a concurrent firmware installation from the HMC to fail with SRC E302F85C.
  • On storage drawers in a cross-coupled topology, an attempt to place an indirect (failover) route at an SNID location in the SRT1 route table may result in a failover route that uses the opposite compute sub-cluster as a bounce point.  The firmware was enhanced to prevent this, since there are no physical links between the two compute sub-clusters in a cross-coupled topology.  Having a failover route through the opposite compute sub-cluster will lead to packet loss and application failure.
  • A problem was fixed that prevented predictive guard errors from being deleted on the secondary service processor.  This caused hardware to be erroneously guarded out if a service processor failover occurred.
  • A problem was fixed that caused the service processor to be reset during a CEC power off or reboot.  This causes the system to terminate, followed by a platform reboot.  SRC B181E6C7 is typically logged when this problem occurs.
  • A problem was fixed that caused a system crash with unrecoverable SRC B7000103 and "ErFlightRecorder" in the failing stack.
  • A problem was fixed that caused the following symptoms on user-level jobs:

      1.  During job initialization when starting communication over the cluster fabric, an error message similar to the following:
              4:ERROR 629 fD4fs: Message type 21 from source 4 4:MPI-PAMI ERROR: pami_init() failed with rc(1) 4:ERROR: 0031-309 Connect failed during message
               passing initialization, task 4, reason:
       2. The initialization may succeed, but an HFI translation failure may occur, causing a time out on the cluster network and other side effects.
System firmware changes that affect certain systems
  • A problem was fixed that caused the dual-port Ethernet adapter, F/C 5270 and F/C 5708, to fail to power on with SRC B7006970.
  • On systems in a high-performance computing (HPC) cluster in 8D topology, a problem was fixed that caused a secondary route to be linked to an indirect route chain.  Jobs that are run in indirect route mode may experience hangs and performance problems.
  • The firmware was enhanced to improve the performance when indirect routing is used in large cluster systems.
AS730_103_093

06/27/12

Impact:  Availability      Severity:  SPE

System firmware changes that affect all systems

  • A problem was fixed that caused a segmentation fault in the service processor firmware.  When this occurred, a perc error with SRC B181C350 was logged.
System firmware changes that affect certain systems
  • On nodes with a single DCCA running AS730_093, a problem was fixed that prevented the node from booting, with SRC 10008732 erroneously logged.
AS730_093_093

06/13/12

Impact:  Serviceability      Severity:  SPE

System firmware changes that affect all systems

  • DEFERREDThe firmware was enhanced to fix a potential performance degradation on systems utilizing the stride-N stream prefetch instructions dcbt (with TH=1011) or dcbtst (with TH=1011).  Typical applications executing these algorithms include High Performance Computing, data intensive applications exploiting streaming instruction prefetchs, and applications utilizing the Engineering and Scientific Subroutine Library (ESSL) 5.1.
  • The firmware was enhanced to correctly handle bus errors between the P7 processor chip and the I/O hub chip.
  • The firmware was enhanced to correctly diagnose the failing FRU when SRC B1xxE504 with error signature "MCFIR[14] - Hang timer detector" was logged.
  • The firmware was enhanced to improve the FRU callouts when the number of multi-bit errors on a POWER7 processor bus exceeds the threshold.  This reduces the number of FRUs replaced on a failing system.
  • A problem was fixed the caused a system to crash when the system was in low power (or safe mode), and the system attempted to switch over to nominal mode.
  • The firmware was enhanced to reduce the impact of heavy volume errors, which can be logged as "sender hang" errors.
  • The firmware was enhanced to reduce the number of "retry fetch CE" and "DRAM spare" error logs entries that call out memory DIMMs.
  • A problem was fixed that caused the first processor module in a node to be erroneously called out if an over-temperature condition was detected, instead of the processor module that was reporting the over-temperature condition.
  • The firmware was enhanced to handle the I/O hub ISR (Integrated Switch Router) link port errors as software-recoverable, rather than as hard failures.  Before this enhancement, the links would have been guarded out even though these errors were recoverable.
  • A problem was fixed that caused a service processor kernel panic due to an out-of-memory condition, with SRC B181720D.
System firmware changes that affect certain systems
  • On systems with F/C 5708 and 5270 Dual port 10GB Ethernet adapter cards installed, a problem was fixed that caused SRC B7006970 to be erroneously logged when the card was powered on.
  • In asymmetric and cross-coupled topologies, if there are no direct dlink connections between a storage drawer and a compute supernode (either through fail-in-place or through having a compute drawer or drawers at standby), then the storage drawer, upon restart or re-initialization of the lnmc daemon (lnmcd), does not provide a failover route to the target compute supernode even though there are suitable bounce points within the compute sub-cluster that can provide the indirect route.  The firmware was enhanced to provide this indirect route.
AS730_084_084

04/12/12

Impact: Function           Severity:  SPE

New Features and Functions

  • Support for cross-coupled compute-to-storage topology for a 2 drawer storage sub-cluster.
  • Support for cross-coupled compute-to-storage topology for a 4 drawer storage sub-cluster.

System firmware changes that affect all systems

  • The firmware was enhanced to allow a node to continue to boot when unrecoverable SRC B181B70C is logged.
  • A problem was fixed that caused an extraneous error log entry calling out DCCA-B and hub R5 when power was removed from DCCA-A, and the service processor and TPMD in DCCA-A were primary.
  • The firmware was enhanced to more gracefully handle the system shutdown that is required when a hypervisor hang condition was encountered.  SRCs B7000602, B182951C, B1813918 and A7001151 were logged, and a service processor failover occurred, when the hypervisor hang condition and subsequent system crash occurred.
  • The firmware was enhanced to cause the secondary service processor to automatically pick up configuration changes stored on the primary service processor.  This prevents the new configuration information from being lost if a service processor failover occurs before the secondary has picked up the new configuration information; typically this problem will only be encountered just after a system is installed.
  • The firmware was enhanced to gracefully recover, and log the correct error logs, if the secondary DCCA loses power.
  • A problem was fixed that prevented communication between the compute and storage networks in asymmetric ISR network topologies.  This affected network topologies DD2_64_8_2A, DD2_64_8_2B, DD2_64_8_4A, and DD2_64_8_4B.
  • A problem was fixed that caused SRC B181E6F1 ("RMGR_PERSISTENT_EVENT_TIMEOUT") to be erroneously logged.
  • The firmware was enhanced to reduce the number of memory DIMMs replaced due to correctable errors being logged.
  • A problem was fixed that caused unrecoverable SRC B130CD03 to be erroneously logged.
  • A problem was fixed that caused SRC B7000602 to be erroneously logged at power on.
  • The firmware was enhance to prevent a potential deadlock in the opposite-side storage drawer if all of the cross-coupled dlinks between a compute supernode (at runtime) and a storage drawer (at runtime) are taken down.  This problem also affects indirect routing from compute to storage over cross-coupled links.
  • A problem was fixed that caused the Local Network Management Controller (LNMC) to be set to the wrong state during a service processor (DCCA) fail-over.  If this problem occurs, the most likely symptom will be a communication failure on the ISR network.
  • A problem was fixed that caused a partition running AIX to crash.
  • A new level of optical link firmware is included in this service pack, and the optical link firmware update function is enabled.  The new optical link device firmware will be automatically installed the next time the node is booted after this service pack is installed.  Please see "Additional Details About Installing This Service Pack" in the "Important Information" section.
  • The firmware was enhanced to increase the threshold of soft NVRAM errors on the service processor to 32 before SRC B15xF109 is logged.  (Replacement of the service processor is recommended if more than one B15xF109 is logged per week.)


4.0 How to Determine Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: AS730_123.

5.0 Downloading the Firmware Package

Follow the instructions on the web page. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a CD-ROM or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: ASXXX_YYY_ZZZ

Where XXX = release level

Instructions for installing firmware updates and upgrades can be found at http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ha1/updupdates.htm

7.0 Firmware History

The complete Firmware Fix History for this Release level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/AS-Firmware-Hist.html

8.0 Change History

Date
Description
April 29, 2015 - Corrected HMC level recommendation in section 1.1