Tuesday, February 17, 2009

SOA Suite 10.1.3.4 on Weblogic HA installation

Recently we did SOA Suite 10.1.3.4 on Weblogic simple HA installation following the enterprise deployment guide. This went quite smoothly without much issues. However we had to redo it once, as one key thing doing this installation is - the directory structure of SOA_HOME and BEA_HOME has to be same in both the nodes.

We were doing this remotely so, to start with we got the vnc access to the both the boxes, also make sure 8001, 9700, 9701, 9702 ports are open for you.

steps are quite straight forward
1. set up RAC DB, we went for a single DB, make sure all DB initialization parameters are set
2. install SOA Suite 10.1.3.1, patch 10.1.3.4 basic install, remember that if you try to access EM after these steps, it will not come up as ascontrol webapp by default is not enabled, so you can do that using this
3. install weblogic, then run the weblogic scripts to configure the domain, clusters, nodes, datasources, and deployapps
couple of things here to note are - make sure to deploy the datasources to cluster, if you get error in any scripts related to unable to get lock you can follow this. Also start SOA Server from Admin console and remmeber we dont need to use opmnstart/stop here.
4. after setting up apphost2 as per the document, we were successfully able to bring up all the consoles, we did one deviation form document for esb_dt setup, instead of setting 9700 in esb_parameter table we set 9702
5. next was webhost and loadbalancer setup

From a deployment perspective, we are deploying our bpels to both the nodes and using roundrobin for load balancing to both the nodes.

We will be exploring more to validate the setup and failover scenarios.

Update on 2/mar/09:
We faced a major issue when we installed one more SOA/HA/Weblogic Environment and by mistake we used the same multicast ip/port. We had to reinstall both the setups the earlier one and the new one again. We got some multicast errors while starting server2, which is
Error: Cluster: BEA-000110 Multicast socket receive error: java.io.EOFException
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:380)
at java.io.ObjectInputStream$BlockDataInputStream.readLong(ObjectInputStream.java:2744)
at java.io.ObjectInputStream.readLong(ObjectInputStream.java:941)
at weblogic.cluster.HeartbeatMessage.readExternal(HeartbeatMessage.java:55)
Truncated. see log file for complete stacktrace


So one has to be very careful with multicast ip/port being unique with every install at the install time itself, changing it later on doesn't help.

updated on 13/mar/09

While our weblogic cluster was working fine, we found some issues while accessing one of the bpel consoles as 404 - Not Found. This made us look into BPEL Clustering on top of Weblogic clustering. The SOA-HA Faq document is very useful to understand this. Basically the BPEL nodes talks to each other to sync processes, any changes in bpel.xml, adapter state etc. And it uses the jGroups configuration to do this. collaxa-config xml has to be updated with 4 changes as per the EDG. The jGroup changes are little tricky. Basically jgroups-protocol xml file has to have a unique multicast ip address (while using UDP not TCP) and if the servers have multiple ips then bind_to_all_interfaces can be made false and bind_addr can be added with the one ip. While our jGroup settings are fine and bpel nodes seem to propagate new processes without restart, we still have a 404 issue, which seems to be something to do with the network ips.