Friday, December 19, 2014

OSB-SOA-OSB zig zag pattern

Recently we ran into performance problems with some of our services following a OSB-SOA-OSB pattern, most of the services follow a OSB-SOA pattern, but for some where the tuxedo transport is used we ran into this pattern as OSB has the tuxedo transport unlike SOA suite.

As we all know OSB is stateless and SOA is generally not stateless and needs a DB, services following a OSB+SOA pattern, also need a consistent logging solution. These custom logging solutions are generally JMS based asynchronous solutions where a listener picks up the log messages and write to db.

There are multiple challenges in a OSB-SOA-OSB-SOA pattern

1) if OSB and SOA domains are separate - then there will be network hops while calling OSB to SOA and other way round - we can use the soa direct (t3) based communication but it has it's own challenges

some of the challenges are
a) if you are using t3 - you cannot use load balancer url - so have to be careful to give all managed server node urls (and test that load balancing is happening)
b) you cannot set a time out on these calls (mostly the JTA timeout takes effect?)
c) there could be additional complications if you use owsm policy at endpoints
d) transaction behavior could also be a challenge

2) having a consistent logging solution, as it is better to do logging as near to the source as possible, we should deploy a common solution for both OSB and SOA, so you would have to set up JMS queues in both OSB and SOA and deploy the code in both environments

3) The complete solution might not scale well - as the threading models for OSB and SOA are very different, and when you have a OSB to SOA to OSB and multiple of such calls - some thread deadlock situations will not be surprising


what was our problem?

we faced a huge latency in our response time, it turned out we were using the publish activity in OSB to do our logging, however publish is not really asynchronous if you publish to a proxy service

this blog helped us to confirm this.

We were publishing to a proxy service which publish to a JMS based business service - if the JMS configurations are not correct - this whole thread waits for a few seconds.

The other problem we had is the tuning of OSB and SOA, There are lot of material available on tuning - however few critical learning are

1) OSB - using workmanagers is absolute must to avoid any problematic areas going out of proportion and bringing down the node, work manager will help contain any spike that a service might have and consuming resourced causing others to starve

how many max threads to assign, assigning work managers to all services or only to some services, assigning to only proxy services or both proxy and business services - these are again areas to tune and test.


2) SOA - if audit is made 'off' at soa-infra level at least a 1 sec drop in response time was noticed, soa db connections, use of Gridlink datasource are also important tuning parameters.

some interesting db queries to check soa table space size here and time taken by soa components here

In summary, we managed to improve response time and throughput but OSB-SOA-OSB patterns will always have a bit overhead compared to only OSB option or only SOA option.

My recommendation would be to have OSB+SOA in same domain/node and then leverage soa direct transport to co-locate as much processing as possible, that would be a much faster option, I believe with 12c such domain topology might become more popular.

No comments: