目录
Amithash Prasad

fscd: Fix fscd KeyError log flooding after fan failure

Summary: [Issue Description]

Related to GC20T5T7-241 When one fan module is removed or fails, the tach sensors of the fan module are detected as failed. Under this condition, fscd repeatedly reports KeyError: ‘dt’, causing duplicated exception messages to continuously flood the system log. [Root Cause]

In the existing fscd flow, ctx[“dt”] is only assigned under a specific condition before calling zone.run(). When chassis_intrusion_boost_flag or sensor_violated_flag is asserted, the flow enters another branch where ctx[“dt”] is not initialized. If the fan failure handling path is subsequently triggered, zone.run() is still invoked with this incomplete context, which causes KeyError: ‘dt’. [Solution]

Initialize ctx[“dt”] with time_difference when the context object is created. Ensure ctx[“dt”] is always available before entering the fan failure handling flow and invoking zone.run(). Prevent fscd from continuously reporting KeyError: ‘dt’ after a fan failure is detected.

Test Plan: Test Procedure:

Remove one fan module to trigger the fan failure condition. Verify that the fan failure is detected and no repeated KeyError: ‘dt’ exception messages are reported. Test Result:

One fan module was removed, and both associated tach sensors were correctly detected as failed. fscd performed the expected server power-off action due to the failed fan condition. No KeyError: ‘dt’ exception messages were observed after applying this fix.

root@bmc-oob:~# log-util all --print
2026 Jun 03 04:31:11 log-util: User cleared all logs
0    all      2026-06-03 04:31:13    healthd          ASSERT: Verified boot failure (3,35)
0    all      2026-06-03 04:31:13    healthd          Verified boot failure reason: U-Boot FIT did not contain the /keys node
0    all      2026-06-03 04:31:18    power-util       SLED_CYCLE starting...
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 NIC AEN Supported: 0x70307, AEN Enable Mask=0x70007
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM type supported = 0x35
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM type 0 version = 1.0.0.0
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM type 2 version = 1.1.1.0
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM type 4 version = 1.0.1.0
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM type 5 version = 1.0.0.0
6    nic      2018-03-09 04:35:07    ncsid            FRU: 6 PLDM sensor monitoring enabled
0    all      2018-03-09 04:35:07    healthd          SLED Powered OFF at Wed Jun  3 04:31:18 2026
0    all      2018-03-09 04:35:08    ipmid            PWR Fault : P12V_FAN_0_PG ,ASSERTED
0    all      2018-03-09 04:35:09    check_nic_status Checking NIC power status...
0    all      2018-03-09 04:35:09    check_nic_status NIC firmware detected, no recovery needed
1    server   2018-03-09 04:35:11    power-util       SERVER_12V_ON successful for FRU: 1
1    server   2018-03-09 04:35:14    ipmid            SEL Entry: FRU: 1, Record: Standard (0x02), Time: 2018-03-09 04:35:14, Sensor: ME_POWER_STATE (0x16), Event Data: (000000) RUNNING Assertion
1    server   2018-03-09 04:35:16    gpiod            FRU: 1, Server is powered on
0    all      2026-06-03 04:32:34    sync-date        Syncing up BMC time from Host(ME)
0    all      2026-06-03 04:32:34    healthd          SLED Powered ON at Wed Jun  3 04:31:47 2026
1    server   2026-06-03 04:33:59    ipmid            SEL Entry: FRU: 1, Record: Facebook Unified SEL (0xFB), GeneralInfo: POST(0x28), POST Failure Event: BIOS fails to get the certificate from BMC, Failure Detail: No certificate at BMC
0    all      2026-06-03 04:35:17    fscd             2 fans failed
0    all      2026-06-03 04:35:17    fscd             0 Front dead, 0 RPM
0    all      2026-06-03 04:35:17    fscd             0 Rear dead, 0 RPM
1    server   2026-06-03 04:35:18    ipmid            SEL Entry: FRU: 1, Record: Facebook Unified SEL (0xFB), GeneralInfo: POST(0x28), POST Failure Event: System HTTP boot fail, Fail Type: IPv6 fail, Error Code: 0x63
1    server   2026-06-03 04:35:20    gpiod            FRU: 1, Server is powered off
4    dpb      2026-06-03 04:35:23    ipmid            DEASSERT: Upper Critical threshold - settled - FRU: 4, num: 0xe0 curr_val: 70 mV, thresh_val: 80 mV, snr: PTB_VSENSE_GND_VOLT_V
0    all      2026-06-03 04:35:26    fscd             SERVER_POWER_OFF due to failed fan over threshold

Reviewed By: jamesatha

Differential Revision: D107419491

fbshipit-source-id: 2bd898f5809bec096ca19c1e83f437611cc83cab

4小时前17939次提交

OpenBMC

OpenBMC is an open software framework to build a complete Linux image for a Board Management Controller (BMC).

OpenBMC uses the Yocto Project as the underlying building and distro generation framework.

Board Description
Wedge A 40G OS-agnostic TOR switch
Yosemite An open source modular chassis for high-powered microservers
Lightning A flexible NVMe JBOF
Wedge100 A 32x100G TOR switch
Backpack LC/FC Linecard and fabric card in a 128x100G modular open switch
Backpack CMM Chassis management module in a 128x100G modular open switch
Tioga Pass A dual-socket compute platform
YosemiteV2 A refresh of Yosemite
Bryce Canyon Disk Storage platform
Grand Canyon Disk Storage platform

Contents

This repository includes 3 set of layers:

  • OpenBMC Common Layer - Common packages and recipes can be used in different types of BMC.
  • BMC System-on-Chip (SoC) Layer - SoC specific drivers and tools. This layer includes the bootloader (u-boot) and the Linux kernel. Both the bootloader and Linux kernel shall include the hardware drivers specific for the SoC.
  • Board Specific Layer - Board specific drivers, configurations, and tools. This layer defines how to configure the image. It also defines what packages to be installed for an OpenBMC image for this board. Any board specific initialization and tools are also included in this layer.

File structure

The Yocto naming pattern is used in this repository. A “meta-layer“ is used to name a layer or a category of layers. And recipe-abc is used to name a recipe. The project will exist as a meta layer itself! Within the Yocto Project’s distribution call this project meta-openbmc.

The recipes for OpenBMC common layer are found in common.

The BMC SoC layer and board specific layer are grouped together based on the vendor/manufacturer name. For example, all Facebook boards specific code should be in meta-facebook. Likewise, meta-aspeed includes source code for Aspeed SoCs.

How to build

Note: In the instruction set below, references to for some of the steps is an example only and need to be replaced with the respective platform when setting up for a different platform.

  1. Set up the build environment based on the Yocto Project’s Quick Start Guide.

  2. Clone the OpenBMC repository and other open source repositories:

    $ git clone -b helium https://github.com/facebook/openbmc.git
    $ cd openbmc
    $ ./sync_yocto.sh
  3. Initialize a build directory for the platform to build. In the openbmc directory:

    $ source openbmc-init-build-env wedge

    Choose between wedge, wedge100, yosemite, or any of the other platforms listed in the meta-facebook directory. After this step, you will be dropped into a build directory, openbmc/build.

  4. Start the build within the build directory: In general to build for the platform:

    $ bitbake <platform>-image

    The build process automatically fetches all necessary packages and builds the complete image. The final build results are in openbmc/build/tmp/deploy/images/<platform>. The root password will be 0penBmc, you may change this in the local configuration.

Build Artifacts

  • u-boot.bin - This is the u-boot image for the board.
  • uImage - This the Linux kernel for the board.
  • -image-.cpio.lzma.u-boot - This is the rootfs for the board.
  • flash- - This is the complete flash image including u-boot, kernel, and the rootfs.

Kernel & U-Boot Development

By default, OpenBMC build process fetches and build Linux kernel and U-boot directly from GitHub repository.

  • To make local kernel changes and build with the modified kernel:

In the build directory, run

$ devtool modify linux-aspeed

or

$ devtool modify u-boot

This will create local Linux package under /workspace/sources/linux-aspeed for development

  • To go back to default recipes, run
    $ devtool reset linux-aspeed

FAQ

1- BMC will take care of the controlling the system / fan based on the sensor/device status (I assume it may even shutdown in case of multiple failures or high temperature). How can we debug such issues? Is there any event/critical logs maintained in the the BMC? Can we have list of files which we can be looked into in case of such issues?

Answer: To debug those issues, you will have to refer to the logs. A: For Rest api related issues, please look at the rest logs under /tmp/ (example: /tmp/rest.log). B: For FSCD related issues, please look at the fscd logs for /var/log/ (example: /var/log/fscd.log). C: For mTerm log (data from the X86 CPU side), please look at /var/log/mTerm.log (it’s usually /var/log/mTerm_wedge.log on most platform). D: Some persistent log also go to /mnt/data/ partition. E: For everything else, look at /var/log/messages.

  1. How do we configure the BMC sensor thresholds for fan / temp / others ? Do we have any command which can be used from the OpenBmc shell?

A- For fan RPM, you can run set_fan_speed.sh to change it (use get_fan_speed.sh to read the value back) from the OpenBMC shell. Some platforms, especially storage/compute ones, use fan-util. Those scripts are under /usr/local/bin on the BMC. Please keep in mind that fscd process will change the fan speed RPM based so your changed values won’t stay for long unless you turn off the watchdog and kill fscd. if you want to change the temperature threshold, you will have to modify the codes and build a new BMC image.

How can I contribute?

If you have an application that can be used by different BMCs, you can contribute your application to the OpenBMC common layer.

If you are a BMC SoC vendor, you can contribute your SoC specific drivers to the BMC SoC layer.

If you are a board vendor, you can contribute your board specific configurations and tools to the Board specific layer. If the board uses a new BMC SoC that is not part of the BMC SoC layer, the SoC specific driver contribution to the BMC SoC layer is also required.

License

OpenBMC is Apache licensed, as found in the LICENSE file.

关于
71.1 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号