Analyzing high CPU consumption (IBM J2EE Engine on Linux)
[100] [CPU] [jlaunch] [load] [loop]
Related:
- xp_readerrorlog can hang and consume a CPUSymptom Under some situations it is possible on SQL Server...
Symptom
The J2EE Engine puts a constant load on one or more central processing units (CPUs) (utilization at 100%).
Other terms
CPU, load, 100%, jlaunch, loop
Reason and Prerequisites
The utilization of the CPU by Java threads must be split into two categories:
1. Threads that utilize the CPU permanently2. Threads that are noticeable depending on the situation and the system load
Threads that the system displays permanently in top or ps as having the highest percentages of CPU use indicate problems.
In the other case, a problem does not automatically exist. You can use only top to identify these threads because this supports displaying threads. The steps listed below do NOT describe how to identify threads in this category.
Depending on which Linux distribution and which IBM JDK 1.4.2 you use, different tools are available:
SLES 9
You cannot use the tools provided in SLES9 to identify threads in category 2 at least. For customers that have a Novell support contract, Novell provides a separate PTF for procps. You must install this.
IBM JDK 1.4.2 up to and including Support Release 9
Currently, you cannot assign the TID to the native ThreadID in javacore on operating system level. You must therefore determine this information using gdb.
Make sure that the most current gdb relevant for your distribution is installed. gdb is part of the Linux distribution you are using and should be installed with the available means (yast for SLES, up2date for RHEL4, yum for RHEL5).
Solution
1. Find the threads that are causing theconstant loadon the CPU.
SLES9 without the procps PTF:
To do this, execute the following command as the root user:
$>ps -eL -o c,pid,tid,ppid,comm | grep jlaunch | sort -r | head -6
When you do so, the system displays the five jlaunch threads with the highest CPU consumption.
The thread with the highest number (in the first column) is the thread that is demanding the most from the CPU.
The outputs look like this:
C PID TID PPID comm
75 31790 511 31565 jlaunch
72 31790 699 31565 jlaunch
5 31790 670 31565 jlaunch
2 31790 31854 31565 jlaunch
2 31790 31853 31565 jlaunch
In this case, threads 511 and 699 are causing the problems.
SLES9 with the procps PTF, SLES10, RHEL 4, and RHEL5:
Execute the following command as the root user:
$> top -iH
The outputs look like this:
Tasks: 513 total, 7 running, 506 sleeping, 0 stopped, 0 zombie
Cpu(s): 48.6%us, 10.5%sy, 0.0%ni, 38.3%id, 1.6%wa, 0.0%hi, 0.8%si, 0.2%st
Mem: 16777380k total, 15516100k used, 1261280k free, 224316k buffers
Swap: 2104472k total, 0k used, 2104472k free, 10478844k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
511 loladm 18 0 2549m 782m 9304 R 99 4.8 0:26.05 jlaunch
699 loladm 15 0 2549m 782m 9304 R 99 4.8 0:19.17 jlaunch
670 loladm 15 0 2549m 782m 9304 R 0 4.8 0:00.01 jlaunch
741 loladm 16 0 5916 1528 872 R 0 0.0 0:00.04 top
In this case, threads 511 and 699 are causing the problems.2. Determine the native ThreadID for the Java threads at operating-system level.
IBM JDK 1.4.2 up to and including Support Release 9
To be able to map from the TID to the native ThreadID (NPTL), you must extract the thread information of the process. However, you can only do this using gdb.
Therefore, create a text file “threadinfo.txt” with the following contents:
attach <PID>
info threads
detach
quit
Replace <PID> with the value in the second column of the problematic thread from point 1 (in this example, 31790).
Extract the thread information by calling gdb and relocate the output to a new file “gdb.out”:
$>gdb -x threadinfo.txt > gdb.out
The file “gdb.out” now contains information about the threads of the process (in this case, process 31790) and looks like this:
…
114 Thread 1106393408 (LWP 15337) 0×00002ad76b767cd1 in recv () from /lib64/libpthread.so.0
113 Thread 1106659648 (LWP 15338) 0×00002ad76b767b8b in mktemp () from /lib64/libpthread.so.0
112 Thread 1126857056 (LWP 511) 0×0000002b43a45a01 in ?? ()
…
Find the native ThreadID in the file gdb.out
In the file, search for the TID (here called LWP) that you found to be the cause of the problem in step 1. The line found also contains the native ThreadID, which must be converted to hex:
In the example, a problematic thread has the TID (=LWP) 511. The search for “LWP 511″ returns the following rows:
112 Thread 1126857056 (LWP 511) 0×0000002b43a45a01 in ?? ()
Therefore, the native ThreadID 1126857056 = 0×432A7960.
IBM JDK 1.4.2 Support Release 10 and higher
As of Support Release 10, the native ThreadID is contained directly in the javacore and must not be determined separately using gdb. The TID can be transferred directly from top but you must convert it to hexadecimal.
In the example, a problematic thread had the TID 511. Therefore, the native ThreadID 511 = 0×1FF.3. Find the hexadecimal ThreadID in the javacore file.
You can use the hexadecimal ThreadID from point 2 to locate the Java thread in the relevant javacore file.
Search for the value (without 0x, in the example, 432A7960 or 1FF):
…
3XMTHREADINFO “Thread-39″ (TID:0×0000002B1F0EDB00, sys_thread_t:0×0000002B1DE34258, state:CW, native ID:0×000000 00432A7960) prio=9
4XESTACKTRACE at java/lang/ThreadLocal.get(ThreadLocal.java:137)
4XESTACKTRACE at com/sap/tc/webdynpro/services/task/TaskBinder.getCurrentTask(TaskBinder.java:52)
…
or
…
3XMTHREADINFO “Thread-39″ (TID:0×0000002B1F0EDB00, sys_thread_t:0×0000002B1DE34258, state:CW, native ID:0×000000 00000001FF) prio=9
4XESTACKTRACE at java/lang/ThreadLocal.get(ThreadLocal.java:137)
4XESTACKTRACE at com/sap/tc/webdynpro/services/task/TaskBinder.getCurrentTask(TaskBinder.java:52)
…
This thread is causing the high load on the CPU.
If the thread is not contained in the existing javacore, create several javacores one after the other until the system displays the final thread.