Cisco Ucs Resolve Slot Issue

Introduction

Cisco Ucs Overview
Cisco Ucs Manager Download
Cisco Ucs Director

I'll be updating the environment this week with Cisco UCS Firmware version 3.1(3a) to hopefully resolve that VMware KB, but the stability issues have got me thinking.' What am I doing wrong - are other folks experiencing similar issues?' After all, we were once a mixture of Dell and HP and never had these issues. This Video i will show you how to configure New UCS Server C series like remote vie CIMC, KVM, configure Raid, Mapping ISO into DVD KVM, Install ESXi customer CISCO IOS. Do a resolve slot issue,. Ack the chassis (recommendation: only in a maintenance window, due to traffic disruption) if this is not possible, you could also ack each IOM separately, therefore bringing down only one fabric for some 10's of seconds.

This document describes on steps to troubleshoot an issue where blade fails to discover due to server power state-MC Error.

Prerequisites

Requirements

Cisco recommends that you have a working knowledge of these topics:

Cisco Unified Computing System (UCS)
Cisco Fabric Interconnect (FI)

Components Used

The information in this document is based on these software and hardware versions:

UCS B420-M3
UCS B440-M3

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

Blade firmware upgrade, the server went down after uptime policy reboot.
Some power event in the data center.

Above could be the possible trigger of the issue.

Problem

This error message occurs upon a reboot or during discovery.

'Unable to change the blade power state'

UCSM reports this alert for a blade that fails to get powered on.

Troubleshoot

From UCSM CLI shell, connect to cimc of the blade and verify the blade power status using power command

ssh FI-IP-ADDR
connect cimc X
power

Verify the sesnor value #

POWER_ON_FAIL | disc -> | discrete | 0x0200 | na | na | na | na | na | na | >>> Non-working

Output from working scenario #

Sensor value#

POWER_ON_FAIL | disc -> | discrete | 0x0100 | na | na | na | na | na | na | >>>> Working

Execute sensors command and check the values of power and voltage sensors. Compare the output with the same model of the blade is powered on state.

If the Reading or Status, columns are NA for certain sensors, this may not be the hardware failure all the time. I have worked on multiple cases where we have seen sensors as NA and able to resolve the issue with these steps.

Logs snippet#

Sel.log#

CIMC | Platform alert POWER_ON_FAIL #0xde | Predictive Failure asserted | Asserted

power-on-fail.hist inside the tmp/techsupport_pidXXXX/CIMCX_TechSupport-nvram.tar.gz)

If the above does not help and as next step, collect UCSM and Chassis techsupport log bundle.

It helps to further investigate the issue.

With the previously mentioned symptoms, Try these steps to recover the issue.

Step 1. FI-A/B# reset slot x/y

For Example #Chassis2-Server 1 is impacted.

Cisco Ucs Overview

FI-A# reset slot 2/1

Wait for 30-40 seconds after running the above command.

Step 2. Decommission the Blade.

Navigate to Equipment >Decommissioned tab and then select the Server and click on Save Changes.

Step 3. Acknowledge the Blade.

Navigate to Equipment > Chassis >Server and Acknowledge the blade

As soon as you acknowledge the blade, discovery starts. This should discover the server successfully.

Cisco Ucs Manager Download

If above does not help, try to RESEAT the blade or place it in another empty slot and see if that fix the issue.

Cisco Ucs Director

If still server is unable to discover reach out to Cisco TAC if this is a hardware issue.

Omgchampion