VCAP5-DCA Objective 6.5 – Troubleshoot vCenter Server and ESXi Host Management

Knowledge

  • Identify CLI commands and tools used to troubleshoot management issues

Skills and Abilities

  • Troubleshoot vCenter Server service and database connection issues
  • Troubleshoot the ESXi firewall
  • Troubleshoot ESXi host management and connectivity issues
  • Determine the root cause of a vSphere management or connectivity issue
  • Utilize Direct Console User Interface (DCUI) and ESXi Shell to troubleshoot, configure, and monitor an environment

Troubleshoot vCenter Server service and database connection issues

Official Documentation:

VMware KB1003926, Troubleshooting the VMware VirtualCenter Server service when it does not start or fails on vCenter Server
VMware KB1003928, vCenter Server installation fails with ODBC and DSN errors

Troubleshooting steps according to VMware KB1003926.
Validate if each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document that helps eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.
To troubleshoot the VMware VirtualCenter Server service when it does not start or fails:
Note: If you perform a corrective action in any of the following steps, attempt to restart the VMware VirtualCenter Server service.

  • Verify that the VMware VirtualCenter Server service cannot be restarted. Open the Microsoft Services control panel and check the status of the service. For more information on starting the VirtualCenter service if it has stopped, see Stopping, starting, or restarting vCenter services (1003895).
  • Verify that the configuration of the ODBC Data Source (DSN) used for connection to the database for vCenter Server is correct. For more information, see vCenter Server installation fails with ODBC and DSN errors (1003928).
  • Verify that ports 902, 80, and 443 are not being used by any other application. If another application, such as Microsoft Internet Information Server (IIS) (also known as Web Server (IIS) on Windows 2008 Enterprise) or Routing and Remote Access Service (RAS) or the World Wide Web Publishing Services (W3SVC) or Windows Remote Management service (WS-Management) or the Citrix Licensing Support service is utilizing any of the ports, vCenter Server cannot start. For more information, see Port already in use when installing vCenter Server (4824652).
    If you see an error similar to one of the following when reviewing the logs, another application may be using the ports:
    • Failed to create http proxy: Resource is already in use: Listen socket: :<port>
    • Failed to create http proxy: An attempt was made to access a socket in a way forbidden by its access permissions.
    • proxy failed on port <port>: Only one usage of each socket address (protocol/network address/port) is normally permitted.
      For more information on checking ports, see Determining if a port is in use (1003971).
  • Verify the health of the database server that is being used for vCenter Server. If the hard drives are out of space, the database transaction logs are full, or if the database is heavily fragmented, vCenter Server may not start. For more information, see Investigating the health of a vCenter Server database (1003979).
  • Verify the VMware VirtualCenter Service is running with the proper credentials. For more information, see After installing vCenter Server, the VMware VirtualCenter Server service fails to start (1004280).
  • Verify that critical folders exist on the vCenter Server host. For more information, see Missing folders on a vCenter Server prevent VirtualCenter Server service from starting (1005882).
  • Verify that no hardware or software changes have been made to the vCenter server that may have caused the failure. If you have recently made any changes to the vCenter server, undo these changes temporarily for testing purposes.
  • Before launching vCenter Server, ensure that the VMwareVCMSDS service is running.

Note: If your problem still exists after trying the steps in this article:

Troubleshooting steps according to VMware KB1003928
Checking which Data Source is being used by vCenter Server

To check which Data Source is currently being used for vCenter Server:

  1. Log into the vCenter Server as an administrator.
  2. Click Start > Run, type regedit, and press Enter. The Registry Editor window opens.
  3. Navigate to:
    HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter\DB
  4. The name of the Data Source that is currently being used is in the 1 registry key. Make note of this name for use in subsequent steps of this article.
  5. Click File > Exit without making any changes.

Viewing and modifying the database server and/or database used by vCenter Server (Microsoft SQL)

To view or modify the database server and/or database that vCenter Server is configured to use when using Microsoft SQL:

  1. Log into the vCenter Server as an administrator.
  2. Click Start > Control Panel > Administrative Tools > Data Sources (ODBC).
    For vCenter Server 4.0 running on a 64-bit host:
    Click Start > Run, type %systemdrive%\Windows\SysWoW64\Odbcad32.exe, and press Enter.
  3. Click the System DSN tab.
  4. Under System Data Sources, select the Data Source that vCenter Server is using, as noted in the previous section of this article.
  5. Click Configure.
  6. On the Configure pane you see the name of the configured database server in the server text box. To change the database server, type the name or IP address of the new server to be used in this box.
  7. Click Next.
  8. Enter the appropriate login credentials on the next page.
    Note: The login information here is not saved, it is simply used for the configuration and testing of the Data Source.
  9. Click Next.
  10. On this pane, you see the database that has been configured. To change the database, ensure that the checkbox for Change the default database to is selected, and select the database that you want to use for vCenter Server.

    Note: If the database has not been selected, the default database for the account is used. To confirm you have selected the database you need, you must log into SQL.

  11. Click Next.
  12. Click Next on the next screen, making no changes.
  13. Click Finish.
  14. Click Test Data Source to verify the information entered.
  15. When the test completes, review the information presented and click OK.
  16. If the test was successful, click OK to exit the wizard. If the test did not complete successfully, click Cancel and review the information entered to ensure it is valid.
  17. Once the test is successful, click OK to exit the ODBC Data Source Administrator window.

Note: For information on modifying the SQL username and password, see Changing the vCenter Server database user ID and password (1006482).
Viewing and modifying the database server and/or database tablespace used by vCenter Server (Oracle)

To view or modify the database server and/or database that vCenter Server is configured to use when using Oracle:

  1. Log into the vCenter Server as an administrator.
  2. Click Start > Control Panel > Administrative Tools > Data Sources (ODBC).
    For vCenter Server 4.0 running on a 64-bit host:
    Click Start > Run, type %systemdrive%\Windows\SysWoW64\Odbcad32.exe, and press Enter.
  3. Click the System DSN tab.
  4. Select the Data Source that vCenter Server is using.
  5. Click Configure.
  6. In the Oracle ODBC Driver Configuration window, note the TNS Service Name.
  7. Edit the tnsnames.ora file with a text editor. This file is generally located in C:\Oracle\Oraxx\NETWORK\ADMIN (where xx is either 9I or 10g). There is an entry similar to the example below, where VPX is the TNS Service name noted in step 6:

    VPX =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS=(PROTOCOL=TCP)(HOST=server)(PORT=1521))
    )
    (CONNECT_DATA =
    (SERVICE_NAME = ServerTableSpace)
    )
    )
    HOST =

    In this example, HOST=server is the managed host to which the client needs to connect, and SERVICE_NAME = ServerTableSpace is the TNS Service name that is being used from the Oracle server.

  8. To change the host that is being connected to, modify server, to the name of the new server that the Data Source connects to.
  9. To change the tablespace that is being used, modify ServerTableSpace to the name of the tablespace being used on the Oracle server.
  10. When this is complete, save and close the file.

To confirm that the changes are successful:

  1. Click Start > Control Panel > Administrative Tools > Data Sources (ODBC).
    For vCenter Server 4.0 running on a 64-bit host:
    Click Start > Run, type %systemdrive%\Windows\SysWoW64\Odbcad32.exe, and press Enter.
  2. Click the System DSN tab.
  3. Select the Data Source that vCenter Server is using.
  4. Click Configure.
  5. Click Test Connection.
  6. Enter the username and password, and click OK.
  7. Review the message that is presented. If the change was successful, click OK and exit out of the driver configuration wizard. If the test fails, review and correct the changes to the configuration and try the test again.

Modifying the username and password vCenter Server uses to connect to the database server (valid only for vCenter Server 2.5.x and below)

A common misconception is that the username and password used for vCenter Server is stored within the Data Source. The username and password for vCenter Server are stored in the registry. Instructions on resetting the password from the installer and resetting the password manually are described below.

Note: Ensure that you are using SQL authentication if you are using a Microsoft SQL server because Windows NT authentication is not supported.
To reset the username and password from the Installer:

  1. Log into the vCenter Server as an administrator.
  2. Click Start > Control Panel > Add or Remove Programs.
  3. Click VMware VirtualCenter Server or VMware vCenter Server from the list of currently installed programs.
  4. Click Change.
  5. Click Next.
  6. Select Repair.
  7. Click Next.
  8. Ensure that Use an existing database server is highlighted.
    Note: If you are using an MSDE or SQL Express installation of vCenter Server, it is set up to use Windows Authentication by default and uses the account the service is set to start with. VMware does not recommended changing this configuration.
  9. Click Next.
  10. This page is where the new username and password are entered. Ensure that the Data Source name is correct, then enter the new username and password.
  11. Click Next.
  12. Click No when you are prompted with this message:
    The DSN points to an existing VMware VirtualCenter repository. Do you want to reinitialize the database and start over with a blank configuration?
    Warning: If you click Yes, your existing configuration is overwritten with a blank new one.
  13. Click Next through the remainder of the installation, leaving the default options selected.
  14. On the Ready to Repair the program screen, click Install.
  15. When the repair is complete, exit the installer by clicking Finish.

To reset the username and password manually, without running the installer (valid for all versions of vCenter Server)

  1. Log into the vCenter Server as an administrator.
  2. Click Start > Run, type regedit, and click OK. The Registry Editor window opens.
  3. Navigate to:
    HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter\DB
  4. The user that is configured for database connectivity is in the 2 registry key.
  5. Right-click on the 2 key, and click Modify.
  6. Change the Value data to the name of the new user account.
  7. Click OK.
  8. Click File > Exit without making any changes.

Alternatively, you can change the password from the command line:

  • Click Start, then type cmd in the search box.
  • When cmd.exe appears, right-click it and choose Run as administrator. The command prompt window opens.
  • Navigate to the directory in which vCenter Server is installed:
    • In VirtualCenter 2.0.x, it is installed by default in:
      C:\Program Files\VMware\VMware VirtualCenter 2.0\
    • In VirtualCenter 2.5.x, vCenter Server 4.x and 5.x, it is installed by default in:
      C:\Program Files\VMware\Infrastructure\VirtualCenter Server\
  • Run the command:
    vpxd.exe -p
  • When prompted, enter the new password and press Enter.
  • Retype the password and press Enter again to complete the password change.
    Note: You must change the password through the command line, as it is encrypted in the registry.

Troubleshoot the ESXi firewall

Official Documentation:

vSphere Security Guide, Chapter 3 “Securing the Management Interface”, page 33.
Security of the ESXi management interface is critical to protect against unauthorized intrusion and misuse.

If a host is compromised in certain ways, the virtual machines it interacts with might also be compromised.

To minimize the risk of an attack through the management interface, ESXi is protected with a firewall.
General Security Recommendations

To protect the host against unauthorized intrusion and misuse, VMware imposes constraints on several parameters, settings, and activities. You can loosen the constraints to meet your configuration needs, but if you do so, make sure that you are working in a trusted environment and have taken enough other security measures to protect the network as a whole and the devices connected to the host.

Consider the following recommendations when evaluating host security and administration.

  • Limit user access.
    To improve security, restrict user access to the management interface and enforce access security policies like setting up password restrictions.
    The ESXi Shell has privileged access to certain parts of the host. Therefore, provide only trusted users with ESXi Shell login access.
    Also, strive to run only the essential processes, services, and agents such as virus checkers, and virtual machine backups.
  • Use the vSphere Client to administer your ESXi hosts.
    Whenever possible, use the vSphere Client or a third-party network management tool to administer your ESXi hosts instead of working though the command-line interface as the root user. Using the vSphere Client lets you limit the accounts with access to the ESXi Shell, safely delegate responsibilities, and set up roles that prevent administrators and users from using capabilities they do not need.
  • Use only VMware sources to upgrade ESXi components.
    The host runs a variety of third-party packages to support management interfaces or tasks that you must perform. VMware does not support upgrading these packages from anything other than a VMware source. If you use a download or patch from another source, you might compromise management interface security or functions. Regularly check third-party vendor sites and the VMware knowledge base for security alerts.

In addition to implementing the firewall, risks to the hosts are mitigated using other methods.

  • ESXi runs only services essential to managing its functions, and the distribution is limited to the features required to run ESXi.
  • By default, all ports not specifically required for management access to the host are closed. You must specifically open ports if you need additional services.
  • By default, weak ciphers are disabled and all communications from clients are secured by SSL. The exact algorithms used for securing the channel depend on the SSL handshake. Default certificates created on ESXi use SHA-1 with RSA encryption as the signature algorithm.
  • The Tomcat Web service, used internally by ESXi to support access by Web clients, has been modified to run only those functions required for administration and monitoring by a Web client. As a result, ESXi is not vulnerable to the Tomcat security issues reported in broader use.
  • VMware monitors all security alerts that could affect ESXi security and, if needed, issues a security patch.
  • Insecure services such as FTP and Telnet are not installed, and the ports for these services are closed by default. Because more secure services such as SSH and SFTP are easily available, always avoid using these insecure services in favor of their safer alternatives. If you must use insecure services and have implemented sufficient protection for the host, you must explicitly open ports to support them.

ESXi Firewall Configuration

ESXi includes a firewall between the management interface and the network. The firewall is enabled by default.

At installation time, the ESXi firewall is configured to block incoming and outgoing traffic, except traffic for the default services listed in “TCP and UDP Ports for Management Access,” on page 19.

Supported services and management agents that are required to operate the host are described in a rule set configuration file in the ESXi firewall directory /etc/vmware/firewall/. The file contains firewall rules and lists each rule’s relationship with ports and protocols. 

You cannot add a rule to the ESXi firewall unless you create and install a VIB that contains the rule set configuration file. The VIB authoring tool is available to VMware partners.
Rule Set Configuration Files

A rule set configuration file contains firewall rules and describes each rule’s relationship with ports and protocols. The rule set configuration file can contain rule sets for multiple services.

Rule set configuration files are located in the /etc/vmware/firewall/ directory. To add a service to the host security profile, VMware partners can create a VIB that contains the port rules for the service in a configuration file. VIB authoring tools are available to VMware partners only.

Each set of rules for a service in the rule set configuration file contains the following information.

  • A numeric identifier for the service, if the configuration file contains more than one service.
  • A unique identifier for the rule set, usually the name of the service.
  • For each rule, the file contains one or more port rules, each with a definition for direction, protocol, port type, and port number or range of port numbers.
  • An indication of whether the service is enabled or disabled when the rule set is applied.
  • An indication of whether the rule set is required and cannot be disabled.

Allow or Deny Access to an ESXi Service or Management Agent

You can configure firewall properties to allow or deny access for a service or management agent.

You add information about allowed services and management agents to the host configuration file. You can enable or disable these services and agents using the vSphere Client or at the command line.
Procedure

  1. Log in to a vCenter Server system using the vSphere Client.
  2. Select the host in the inventory panel.
  3. Click the Configuration tab and click Security Profile.
    The vSphere Client displays a list of active incoming and outgoing connections with the corresponding firewall ports.
  4. In the Firewall section, click Properties.
    The Firewall Properties dialog box lists all the rule sets that you can configure for the host.
  5. Select the rule sets to enable, or deselect the rule sets to disable. The Incoming Ports and Outgoing Ports columns indicate the ports that the vSphere Client opens for the service. The Protocol column indicates the protocol that the service uses. The Daemon column indicates the status of daemons associated with the service.
  6. Click OK.

Add Allowed IP Addresses

You can specify which networks are allowed to connect to each service that is running on the host.

You can use the vSphere Client or the command line to update the Allowed IP list for a service. By default, all IP addresses are allowed.
Procedure

  1. Log in to a vCenter Server system using the vSphere Client.
  2. Select the host in the inventory panel.
  3. Click the Configuration tab and click Security Profile.
  4. In the Firewall section, click Properties.
  5. Select a service in the list and click Firewall.
  6. Select Only allow connections from the following networks and enter the IP addresses of networks that are allowed to connect to the host.
    You can enter IP addresses in the following formats: 192.168.0.0/24, 192.168.1.2, 2001::1/64, or fd3e:29a6:0a81:e478::/64.
  7. Click OK.

Automating Service Behavior Based on Firewall Settings

ESXi can automate whether services start based on the status of firewall ports.

Automation helps ensure that services start if the environment is configured to enable their function. For example, starting a network service only if some ports are open can help avoid the situation where services are started, but are unable to complete the communications required to complete their intended purpose. 

In addition, having accurate information about the current time is a requirement for some protocols, such as Kerberos. The NTP service is a way of getting accurate time information, but this service only works when required ports are opened in the firewall. The service cannot achieve its goal if all ports are closed. The NTP services provide an option to configure the conditions when the service starts or stops. This configuration includes options that account for whether firewall ports are opened, and then start or stop the NTP service based on those conditions. Several possible configuration options exist, all of which are also applicable to the SSH server.

  • Start automatically if any ports are open, and stop when all ports are closed: The default setting for these services that VMware recommends. If any port is open, the client attempts to contact the network resources pertinent to the service in question. If some ports are open, but the port for a particular service is closed, the attempt fails, but there is little drawback to such a case. If and when the applicable outgoing port is opened, the service begins completing its tasks.
  • Start and stop with host: The service starts shortly after the host starts and closes shortly before the host shuts down. Much like Start automatically if any ports are open, and stop when all ports are closed, this option means that the service regularly attempts to complete its tasks, such as contacting the specified NTP server. If the port was closed but is subsequently opened, the client begins completing its tasks shortly thereafter.
  • Start and stop manually: The host preserves the user-determined service settings, regardless of whether ports are open or not. When a user starts the NTP service, that service is kept running as long as the host is powered on. If the service is started and the host is powered off, the service is stopped as part of the shutdown process, but as soon as the host is powered on, the service is started again, preserving the userdetermined state.

More information:

  • VMware KB2005284 “About the ESXi 5.0 firewall”, presents a nice overview of the available esxcli network firewall namespace
  • VMware KB2008226 “Creating custom firewall rules in VMware ESXi 5.0”

Troubleshoot ESXi host management and connectivity issues

Official Documentation:

Some useful reading on this topic:

Troubleshooting steps according to VMware KB1003409.

Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. After each step, try to connect to vCenter Server. Do not skip a step.

  1. Verify that the ESXi host is in a powered on state.
  2. Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the issue. For more information, see Changing an ESXi or ESX host’s connection status in vCenter Server (1003480).
  3. Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address. If vCenter Server does not receive heartbeats from the ESXi host, it goes into a not responding state. To verify if the correct Managed IP Address is set, see Verifying the vCenter Server Managed IP Address (1008030) and ESXi 5.0 hosts are marked as Not Responding 60 seconds after being added to vCenter Server (2020100). See also, ESXi/ESX host disconnects from vCenter Server after adding or connecting it to the inventory (2040630)
  4. Verify that network connectivity exists from vCenter Server to the ESXi host. For more information, see Testing network connectivity with the ping command (1003486).
  5. Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP port 902. If the host was upgraded from version 2.x and you cannot connect on port 902, then verify that you can connect on port 905. For more information, see Testing port connectivity with Telnet (1003487).
  6. Verify if restarting the ESXi Management Agents resolves the issue. For more information, see Restarting the Management agents on an ESXi or ESX host (1003490).
  7. ESXi hosts can disconnect from vCenter Server due to underlying storage issues. To investigate further, see Identifying Fibre Channel, iSCSI, and NFS storage issues on ESXi/ESX hosts (1003659).

Troubleshooting steps according to VMware KB1003490.

Restarting the Management agents on ESXi.

DCUI:

  1. Connect to the console of your ESXi host.
  2. Press F2 to customize the system.
  3. Log in as root.
  4. Use the Up/Down arrows to navigate to Restart Management Agents.
    Note: In ESXi 4.1 and ESXi 5.x, this option is available under Troubleshooting Options.
  5. Press Enter.
  6. Press F11 to restart the services.
  7. When the service has been restarted, press Enter.
  8. Press Esc to log out of the system.

From Local Console or SSH:

  1. Log in to SSH or Local console as root.
  2. Run these commands:
    /etc/init.d/hostd restart
    /etc/init.d/vpxa restart

    Note: In ESXi 4.x, run this command to restart the vpxa agent:
    /etc/opt/init.d/vmware-vpxa restart

Troubleshooting steps according to VMware KB1002849.

Note: Some of these steps are valid only for ESX only, and not ESXi as the Service Console has been removed.

The vmware-hostd management service is the main communication channel between ESX/ESXi hosts and VMkernel. If vmware-hostd fails, ESX/ESXi hosts disconnects from vCenter Server/VirtualCenter and cannot be managed, even if you try to connect to the ESX/ESXi host directly. When this happens, you see these errors.

To resolve this issue, validate that each troubleshooting step below is true for your environment. The steps provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. After each step, attempt to restart the management agents. Do not skip a step.
When the vmware-hostd service fails to respond

  • Verify network connectivity to the ESX service console or the ESXi management console. For more information, see Testing network connectivity with the ping command (1003486).
  • Verify that vmware-hostd is running. For more information, see Verifying that the Management Service is running on an ESX host (1003494) and Verifying if management services are running on an ESXi host (2030663).
  • Verify that either ports 80 or 443 are open, by running this command:
    netstat -an command
    For more information, see Determining if a port is in use (1003971).
  • Verify that the /etc/hosts file is written correctly and has entries similar to:
    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1 <localhost>.<localdomain> <localhost>
    10.0.0.1 <server>.<domain> <server>
  • Verify that service console partitions have available disk space. If either / or /var/log is full, then vmware-hostd cannot start because it is trying to write information to a full disk. For more information on disk space usage on the ESX host, see Investigating disk space on an ESX or ESXi host (1003564).
  • Verify that there is SAN connectivity and that SAN has been properly added or removed, by running this command:
    ls /vmfs/volumes
    or
    vdf -h
    If the commands take a very long time to complete or report an error, see Identifying shared storage issues with ESX or ESXi (1003659).
  • On an ESX host only, verify that file /etc/vmware/esx.conf is not missing or corrupt. If the file is missing or corrupt, replace it with a backup copy from/var/log/oldconf/. For more information, see Troubleshooting an ESX host that does not boot (10065).
  • For an ESX host only, verify that there are no syntax errors in the /etc/vmware/firewall/services.xml file:
    • Check /var/log/vmware/hostd.log for these errors:
      [‘ServiceSystem’ 3076444288 verbose] Command finished with status 0
      [‘FirewallSystem’ 3076444288 verbose] Loading firewall configuration file ‘/etc/vmware/firewall/services.xml’
      [‘App’ 3076444288 panic] Application error: no element found
    • Run the command:
      esxcfg-firewall -q
      You may see this error:
      No element found at line 480, column 0, byte 11664 at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 185

    If you observe any of these errors, see Troubleshooting the firewall policy on an ESX host (1003634).

  • Verify that CPU usage is below 90%, by running this command:
    esxtop
    For more information regarding esxtop, see Using esxtop to Troubleshoot Performance Problems.
    If vmware-hostd is using more than 90% CPU, increase the amount of memory that is assigned to the ESX service console (valid for ESX only). For more information, see Increasing the amount of RAM assigned to the ESX Server service console (1003501).
    If a third-party component is using more than 90% CPU:
    • Check if HP Insight Manager process cmahostd is consuming CPU. If this process is running, upgrade HP Insight Manager.
    • Check if third-party software is running on the service console. If you have third-party products installed in the service console, stop the applicable processes and services and attempt to start the management agent.

    For more information, see Third-Party Software in the Service Console.

  1. For ESX only, check any virtual machines that were migrated from ESX 2.5.x or P2Ved with VMware Converter. For more information, see vmware-hostd may use a lot of CPU or has generated a core dump on an ESX host (4718356).
  2. For ESX only, check for security scanners on your network. For more information, see The ESX Management agent fails when scanned by network security scanner (1002707).

When the vmware-hostd service fails to start

If the vmware-hostd service fails to start, perform these troubleshooting steps:

  • Check for failed Network File System (NFS) or Server Message Block (SMB) mounts on the ESX/ESXi host. If the are failed NFS or SMB mounts, disable or remove the mounts and restart mgmt-vmware.
  • For ESX only, check the /etc/vmware/firewall directory for any files other than service.xml. If there are any extraneous files in the directory, move them to an alternate location.
  • Check for corruption of virtual machine configuration files. For more information, see Re-registering orphaned virtual machines (1007541).
  • Check for corruption of the /etc/vmware/hostd/config.xml by looking for blank hostd logs. If the config.xml file is corrupt, reinstall it:
    • For ESX only, copy the RPM Package Manager from your installation media. On the installation CD it is located in \VMware\RPMS\VMware-hostd-xxxxx.i386.rpm.
      Note: Be sure to copy the same version of hostd for the version ESX that you are using. To find the exact version of hostd you are using, run this command:
      rpm -qa | grep hostd
    • Run this command:
      rpm -ivh –replacepkgs VMware-hostd-xxxxx.i386.rpm
  • For ESX only, check if there are any third-party monitoring applications using port 9080, such as:
    • Computer Associates (CA) Network System Manager (NSM) (R11)
    • CA Advanced System Manager (ASM) (R11.1)
    • CAeAC – etrust
  • In ESXi 5.x, run these commands:
    /etc/init.d/hostd status
    /etc/init.d/hostd start
    /etc/init.d/hostd stop

    If a third-party monitoring applications is using port 9080, you may see these error messages:
    [‘Solo’ 3076436096 info] Micro web server port: 9080
    [‘App’ 3076436096 panic] Application error: Address already in use
    [‘App’ 3076436096 panic] Backtrace generated

Disabling the services resolves the issue. For more information, see Third-Party Software in the Service Console.

If the issue continues to exist after trying the steps in this article:

Collect the VMware Support information. For more information, see Collecting diagnostic information for VMware products (1008524)

File a support request with VMware Support and note this KB Article ID (1002848) in the problem description. For more information, see How to Submit a Support Request.

Determine the root cause of a vSphere management or connectivity issue

Official Documentation:

See also Objective 6.3, section on “Analyze troubleshooting data to determine if the root cause for a given network problem originates in the physical infrastructure or vSphere environment””.

General recommendations for troubleshooting virtual network troubleshooting:

  • Start Bottom-up instead of Top Down;
  • Start with physical Layer (L1) of the OSI Model and work your way up.
  • Know the concepts of Standard switches and Distributed switches.
    Understand the difference between VM portgroups and VMkernel Portgroups.
    Know how to configure VMkernel Portgroups.
    Understand physical uplinks, NIC teaming and Security settings.
    Physical NICs are connected to physical switches.
    Know how switch ports are configured, access port, trunk port, which VLANs are allowed.
  • dvSwitches can standardize configurations across all hosts but also complicate troubleshooting.
  • Avoid the urge to reboot and continue searching for the root cause (your evidence has usually gone after a reboot.

Utilize Direct Console User Interface (DCUI) and ESXi Shell to troubleshoot, configure, and monitor an environment

Official Documentation:

The Direct Console User Interface (DCUI) is a limited interface for performing maintenance and troubleshooting tasks. The following are available:

  • Configuring Lockdown Mode
  • Configuring the root password
  • Configuring, testing, restarting, and restoring the Management network
  • Viewing system logs and support information
  • Rebooting
  • Enabling, disabling and accessing the ESXi Shell

The ESXi Shell includes a fully supported command list. To access the ESXi Shell from the DCUI, you must perform the following tasks: (Note that this may be enabled from the vSphere Client Configuration –> Security Profile tab)

  1. From the physical console of your ESXi host, press the F2 button and authenticate
  2. Select Troubleshooting Options and press Enter
  3. Select Disable ESXi Shell and press Enter to Enable ESXi Shell
  4. You may optionally adjust the timeout
  5. Press Esc to return to the main window
  6. At the main console screen, press Alt + F1 to open the ESXi Shell
  7. From within the shell, you may run esxcli commands, esxtop, and access the local filesystem. (Note: To return to the DCUI, press Alt + F2)

For additional information on using the esxcli command set, see this VMware Doc.
More information:

Other exam notes

VMware vSphere official documentation

VMware vSphere Basics Guide html pdf epub mobi
vSphere Installation and Setup Guide html pdf epub mobi
vSphere Upgrade Guide html pdf epub mobi
vCenter Server and Host Management Guide html pdf epub mobi
vSphere Virtual Machine Administration Guide html pdf epub mobi
vSphere Host Profiles Guide html pdf epub mobi
vSphere Networking Guide html pdf epub mobi
vSphere Storage Guide html pdf epub mobi
vSphere Security Guide html pdf epub mobi
vSphere Resource Management Guide html pdf epub mobi
vSphere Availability Guide html pdf epub mobi
vSphere Monitoring and Performance Guide html pdf epub mobi
vSphere Troubleshooting html pdf epub mobi
VMware vSphere Examples and Scenarios Guide html pdf epub mobi


Related articles:

Disclaimer.
The information in this article is provided “AS IS” with no warranties, and confers no rights. This article does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion.

Marco

Marco works for ViaData as a Senior Technical Consultant. He has over 15 years experience as a system engineer and consultant, specialized in virtualization. VMware VCP4, VCP5-DC & VCP5-DT. VMware vExpert 2013, 2014,2015 & 2016. Microsoft MCSE & MCITP Enterprise Administrator. Veeam VMSP, VMTSP & VMCE.