Friday, November 8, 2013

Backend WLS or EM application seems to be down

A few days ago, the server, where OMS and its DB reside, crushed.
After we restarted it, whenever we tried to access the grid console, we got error "Backend WLS or EM application seems to be down".
Agents failed to upload XML files to OMS and "emctl pingOMS" was giving an error "EMD pingOMS error: No response header from OMS".
We checked WebLogic and OMS .trc, .log and .out files, but there was no error recorded, neither before nor after the crush.

we had to restart database and restarted grid control 12c.

To correct this issue:

    1. Stop OMS 
     /u00/app/gc/oms12cr3/oms/bin/emctl stop oms -all

    2. Kill -9 all WebLogic and OMS processes still running after the stop. You can find these processes, using ps.
     ps -ef | grep EMGC_ADMINSERVER
     ps -ef | grep EMGC_OMS1
     ps -ef | grep oms

    3. Delete every .lok file you find under WebLogic Domain
cd /u00/app/gc/gc_inst/user_projects/domains/GCDomain
find . -name "*.lok"

    These files were:
    ../gc_inst/user_projects/domains/GCDomain/config/config.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/ldap/ldapfiles/EmbeddedLDAP.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/tmp/EMGC_OMS1.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/data/ldap/ldapfiles/EmbeddedLDAP.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/tmp/EMGC_ADMINSERVER.lok

    4. Start OMS
     /u00/app/gc/oms12cr3/oms/bin/emctl start oms

   5. Check OMS log file after restart
tail -f /u00/app/gc/gc_inst/NodeManager/emnodemanager/nodemanager.log
tail -f /u00/app/gc/gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/logs/EMGC_ADMINSERVER.out    
tail -f /u00/app/gc/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out

The best matching Oracle Documents about this incident are:
The best matching Oracle Documents about this incident are:
  • ID 943790.1: What are the .lok Files Used For in a WebLogic Server (WLS) Domain? In general, these files are a mechanism to ensure file and server locks and to prevent a server from being booted twice.
  • ID 957377.1: Weblogic Fails To Start With Error "Unable To Obtain Lock"
  • ID 1235753.1: 11g Grid Control: OMS Startup Shows "AdminServer Could Not Be Started" but OMS is able to Startup