When your java application is getting unresponsive or requests are taking time, taking thread dumps will help find the cause.
If you take one thread dump, you might not be getting much information. You need to take at least 10 to 15 thread dumps with 3 to 5 seconds apart, and then you go through each thread dump to find what is causing threads to hang.
Getting thread dumps
Below is how you can take thread dumps with
- Find the JVM process id.
ps -ef | grep java
can be used to find the java pid. - Take thread dump using
jstack
for every 3 secondsfor i in $(seq 1 10); do jstack $pid > td_${i}.log; sleep 3; done
Will generate ten thread dumps, taken 3 seconds apart.
-rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_1.log -rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_2.log -rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_3.log -rw-rw-r-- 1 user app 288211 Oct 29 19:02 td_4.log -rw-rw-r-- 1 user app 287726 Oct 29 19:02 td_5.log -rw-rw-r-- 1 user app 288594 Oct 29 19:02 td_6.log -rw-rw-r-- 1 user app 288581 Oct 29 19:03 td_7.log -rw-rw-r-- 1 user app 288581 Oct 29 19:03 td_8.log -rw-rw-r-- 1 user app 287996 Oct 29 19:03 td_9.log -rw-rw-r-- 1 user app 287983 Oct 29 19:03 td_10.log
Analyzing thread dumps
Now we have ten thread dumps first, open the first thread, and check for the request which is not responding.
For example, you know that the /atg/bcc
request is taking time, then search /atg/bcc
or respective session id
in the thread.
For example in below thread dump (td_1.log), where thread ajp-0.0.0.0-20109-24
with request uri /atg/bcc
is at socket read waiting for oracle database result.
"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.74.8 userid=iuser
36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 method=GET SAT_TRAN
S=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:128)
at oracle.net.ns.Packet.receive(Packet.java:300)
at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.fetch(T4CPreparedStatement.java:1119)
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:373)
- locked <0x000000058b076d18> (a oracle.jdbc.driver.T4CConnection)
Now we can check what is the state of this thread ajp-0.0.0.0-20109-24
in remaining threads.
grep -A 2 "ajp-0.0.0.0-20109-24" `ls -ltr td* | awk '{print $9}'` | less -i
Above will search for thread ajp-0.0.0.0-20109-24
and -A 2
will get the next two lines.
ls -ltr td* | awk '{print $9}'
will make sure that threads are listed in order.
Optionally use less -i
to open in less
to view and search for any additional patterns.
Below is how the sample output would be
td_1.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.7
4.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 met
hod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_1.log- java.lang.Thread.State: RUNNABLE
td_1.log- at java.net.SocketInputStream.socketRead0(Native Method)
--
td_2.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.7
4.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 met
hod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_2.log- java.lang.Thread.State: RUNNABLE
td_2.log- at java.net.SocketInputStream.socketRead0(Native Method)
--
...
--
td_10.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.
74.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 me
thod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_10.log- java.lang.Thread.State: RUNNABLE
td_10.log- at java.net.SocketInputStream.socketRead0(Native Method)
As you can see, the above thread is waiting for the oracle result in the last thread dump, too (after 30 secs). We can check the main stack trace and see if the query is taking time due to a database issue. Similarly, you can also search for any lock acquired by threads.
You can also get snapshot of all thread dumps which are waiting on oracle result.
grep -B 19 'oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next' \
`ls -ltr td* | awk '{print $9}'` \
| grep ajp | awk '{print $1}'
Output
td_1.log-"ajp-0.0.0.0-20109-24
td_2.log-"ajp-0.0.0.0-20109-24
td_4.log-"ajp-0.0.0.0-20109-27
td_5.log-"ajp-0.0.0.0-20109-27
td_6.log-"ajp-0.0.0.0-20109-27
td_7.log-"ajp-0.0.0.0-20109-27
td_7.log-"ajp-0.0.0.0-20109-23
td_8.log-"ajp-0.0.0.0-20109-27
td_8.log-"ajp-0.0.0.0-20109-23
td_9.log-"ajp-0.0.0.0-20109-27
td_9.log-"ajp-0.0.0.0-20109-23
td_10.log-"ajp-0.0.0.0-20109-27
td_10.log-"ajp-0.0.0.0-20109-23
First you search oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next
(this can be any pattern you are searching for) and use -B 19
(print 19 lines before this pattern)
and on these results, search for ajp threads.
Conclusion
By following the above steps, you can analyze thread dumps to find issues with jstack
, grep
, ls
, ‘ awk’.
Hope this helps
– RC
Comments