Enabling process restart on failure
In a distributed environment, you can use the health management feature to monitor the status of application servers, nodes, clusters, dynamic clusters, on demand routers, and cells so that you can sense and respond to problem areas before an outage occurs. You can manage the health of an application serving environment with a policy-driven approach that enables specific actions to occur when monitored criteria is met. For example, for an application server, when memory usage exceeds a percentage of the heap size for a specified time, health policy actions can run to correct the situation. The following list shows some of the predefined health policy actions that are applicable to excessive memory usage:
Take thread dumps
Take JVM heap dumps
Generate a SNMP trap
Place server in maintenance mode
Place server in maintenance mode and break affinity
Place server out of maintenance mode
All of the listed actions can be grouped and used in a custom sequence to help you detect and correct the problem. You can use the administrative console to set health policies by clicking Operational policies > Health policies. Figure below describes a sequence of actions that you might set in case your server exceeds 90 percent of the JVM heap size for a period of two minutes.
The two reaction modes for the health management monitor are:
Supervise: When the health condition is reached, a task is submitted with a suggested plan of action that is automatically carried out if the task is approved.
Automatic: When the health condition is reached, the actions are automatically carried out in the order you previously defined.
You can define a large number of custom health conditions and actions for when the health conditions breach. Intelligent management features help you recover from the most common operational issues, and there is a more general way to restart your server processes. You can use the native operating system functionality to restart a failed process.
Periodic product maintenance is important to keep your system environment working correctly, and to avoid trouble caused by known issues. At some point in time, you might have a problem with a server and need to run diagnostic tests to troubleshoot a specific application server. These situations can lead to the disruption of client requests to servers in your environment.
Using the Intelligent Management feature, you can maintain the environment without disrupting traffic to the production environment. You can use it to administratively put a server or node in the cell into maintenance mode. In a normal mode, the ODR sends requests to application servers. Using maintenance mode, you can stop routing from the ODR to the nodes or servers that are placed into maintenance mode. This action maintains these nodes or servers with minimum disruption to your environment. The Application Placement Controller also excludes the node or server from automatic application placement. Maintenance mode is only recognized by the ODR. However, the heath controller also uses the server maintenance mode as an action that is taken when a health policy is breached.
Node maintenance mode
You can put a node into maintenance mode when you need to apply operating system fixes or run WebSphere maintenance. When a node is in maintenance mode, only traffic with affinity to servers on the node is routed to the server by the ODR. A maintenance immediate stop mode can be set that immediately stops the servers on the node.
Server maintenance mode
You can put a server into maintenance mode when you need to perform server level problem determination. When an application server is placed into maintenance mode, you can indicate one of these modes:
Allow all traffic to the server
Allow only traffic with affinity
Allow no traffic during the maintenance period
The maintenance immediate stop mode is also available that immediately stops the
application server. Each of the maintenance modes for nodes and servers can be enabled by
using the administrative console or through
IBM Monitoring and Diagnostic tools for Java
IBM Monitoring and Diagnostic tools for Java are available using IBM Support Assistant, which is a workbench that offers a single point to access these tools. Using IBM Monitoring and Diagnostic tools for Java, you can analyze applications, garbage collection files, Java heap dump files, and Java core files.
Health center allows you to monitor the real-time running applications and provides useful information about memory, class loading, I/Os, object allocations, and the system. This tool can help you to identify application memory leaks, I/O bottlenecks, and lock contentions and can help you to tune the garbage collector. The health center is designed to minimize the performance impact of the monitoring.
This tool analyzes the Java heap of a JVM process, identifies potential memory leaks, and provides the application memory footprint. Memory analyzer provides a useful object tree browsing function to focus on the objects' interactions and to analyze the memory usage.
This tool determines the causes of Java crashes by analyzing the operating system dump. This analysis can be useful to better understand the application failures.
Garbage collection and memory visualizer
This tool helps you analyze and tune the garbage collection, similar to PMAT. It also provides recommendations to optimize the garbage collector and to find the best Java heap settings. Garbage collection and memory visualizer allow you to browse the garbage collection cycles and to better understand the memory behavior of the application.