Friday, December 19, 2014

SOA composite deployment coherence issue

While deploying a composite to a soa cluster - the deployment was stuck for more than 20 mnts. To make it worse when it was cancelled/retried it corrupted the MDS causing soa-infra to fail while restart.

Clearly the logs showed STUCK THREAD

<[STUCK] ExecuteThread: '56' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "602" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 602461 ms
[
POST /soa-infra/deployer HTTP/1.1
Connection: TE
TE: trailers, deflate, gzip, compress
User-Agent: Oracle HTTPClient Version 10h
Accept-Encoding: gzip, x-gzip, compress, x-compress
ECID-Context: 
Authorization: Basic amNoZW42Ol8xYW1BZG1pbg==
Content-type: application/octet-stream
Content-Length: 69483

]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-701 "[STUCK] ExecuteThread: '56' for queue: 'weblogic.kernel.Default (self-tuning)'" {
    -- Waiting for notification on: java.util.HashMap@4343a522[fat lock]
    java.lang.Object.wait(Object.java:???)
    oracle.integration.platform.blocks.deploy.CoherenceCompositeDeploymentCoordinatorImpl.submitRequestAndWaitForCompletion(CoherenceCompositeDeploymentCoordinatorImpl.java:352)
    oracle.integration.platform.blocks.deploy.CoherenceCompositeDeploymentCoordinatorImpl.coordinateCompositeRedeploy(CoherenceCompositeDeploymentCoordinatorImpl.java:255)
    oracle.integration.platform.blocks.deploy.servlet.BaseDeployProcessor.overwriteExistingComposite(BaseDeployProcessor.java:487)
    oracle.integration.platform.blocks.deploy.servlet.BaseDeployProcessor.deploySARs(BaseDeployProcessor.java:298)
    ^-- Holding lock: java.lang.Object@73823526[thin lock]


The soa-infra error

weblogic.application.ModuleException: [HTTP:101216]Servlet: "FabricInit" failed to preload on startup in Web application: "/soa-infra".
oracle.fabric.common.FabricException: Error in getting XML input stream: oramds:/deployed-composites/AccountBS_rev1.0/composite.xml: oracle.mds.exception.MDSException: MDS-00054: The file to be loaded oramds:/deployed-composites/AccountBS_rev1.0/composite.xml does not exist.


In case of soa-Infra error this blog has steps on how to recover

The deployment STUCK THREAD issue points to coherence related issues, there are many useful troubleshooting documents on oracle.support

General Coherence Network Troubleshooting And Configuration Advice (Doc ID 1389045.1)

Coherence and SOA Suite Integration Recommendations (Doc ID 1557370.1)

Troubleshooting Tips for Coherence - Oracle Service Oriented Architecture (SOA) Suite Integration Issues (Doc ID 1388786.1)

"oracle.integration.platform.blocks.deploy.CoherenceCompositeDeploymentCoordinatorImpl.submitRequestAndWaitForCompletion" Error and Slow Response While Accessing Composites In EM Console (Doc ID 1437883.1)

SOA 11g Composite Deployment Results in Stuck Thread Error: <[STUCK] ExecuteThread - Unable to Deploy the Composites in a Cluster (Doc ID 1086654.1)

SOA 11g Health Check: Verify Consistency of Coherence wka and wka.port Configuration (Doc ID 1578203.1)

SOA 11g: How Many Nodes are Required to be Specified as Coherence WKA Members in a SOA/OSB Cluster? (Doc ID 1511706.1)

Stuck Threads during SOA Cluster Deployment (Doc ID 1564586.1)

IpMontor Failed To Verify The Reachability Of Senior Member (Doc ID 1530288.1)



OSB-SOA-OSB zig zag pattern

Recently we ran into performance problems with some of our services following a OSB-SOA-OSB pattern, most of the services follow a OSB-SOA pattern, but for some where the tuxedo transport is used we ran into this pattern as OSB has the tuxedo transport unlike SOA suite.

As we all know OSB is stateless and SOA is generally not stateless and needs a DB, services following a OSB+SOA pattern, also need a consistent logging solution. These custom logging solutions are generally JMS based asynchronous solutions where a listener picks up the log messages and write to db.

There are multiple challenges in a OSB-SOA-OSB-SOA pattern

1) if OSB and SOA domains are separate - then there will be network hops while calling OSB to SOA and other way round - we can use the soa direct (t3) based communication but it has it's own challenges

some of the challenges are
a) if you are using t3 - you cannot use load balancer url - so have to be careful to give all managed server node urls (and test that load balancing is happening)
b) you cannot set a time out on these calls (mostly the JTA timeout takes effect?)
c) there could be additional complications if you use owsm policy at endpoints
d) transaction behavior could also be a challenge

2) having a consistent logging solution, as it is better to do logging as near to the source as possible, we should deploy a common solution for both OSB and SOA, so you would have to set up JMS queues in both OSB and SOA and deploy the code in both environments

3) The complete solution might not scale well - as the threading models for OSB and SOA are very different, and when you have a OSB to SOA to OSB and multiple of such calls - some thread deadlock situations will not be surprising


what was our problem?

we faced a huge latency in our response time, it turned out we were using the publish activity in OSB to do our logging, however publish is not really asynchronous if you publish to a proxy service

this blog helped us to confirm this.

We were publishing to a proxy service which publish to a JMS based business service - if the JMS configurations are not correct - this whole thread waits for a few seconds.

The other problem we had is the tuning of OSB and SOA, There are lot of material available on tuning - however few critical learning are

1) OSB - using workmanagers is absolute must to avoid any problematic areas going out of proportion and bringing down the node, work manager will help contain any spike that a service might have and consuming resourced causing others to starve

how many max threads to assign, assigning work managers to all services or only to some services, assigning to only proxy services or both proxy and business services - these are again areas to tune and test.


2) SOA - if audit is made 'off' at soa-infra level at least a 1 sec drop in response time was noticed, soa db connections, use of Gridlink datasource are also important tuning parameters.

some interesting db queries to check soa table space size here and time taken by soa components here

In summary, we managed to improve response time and throughput but OSB-SOA-OSB patterns will always have a bit overhead compared to only OSB option or only SOA option.

My recommendation would be to have OSB+SOA in same domain/node and then leverage soa direct transport to co-locate as much processing as possible, that would be a much faster option, I believe with 12c such domain topology might become more popular.

Sunday, October 05, 2014

The Hadoop Puzzle

"Big data is at the foundation of all the megatrends happening today" - when I saw this here - It made perfect sense, with all the different applications of the hadoop technology. I try to explore a few trends in this blog.

hadoop core
At the heart of hadoop technology there are 2 parts
  • The distributed file system or HDFS
  • The data processing part using MR (Map-Reduce) on HDFS
And the overall master-worker clustering technology that makes everything work 

where does hadoop fits in the new world?
Typically we have applications in the pattern of OLTP + OLAP
OLTP is the Transaction part of it based on RDBMS
OLAP is the DW+Analytics part of it based on also RDBMS Or Advanced MPP databases such as Teradata
so the typical flow of data is
OLTP DB --> (ETL) --> DW --> Analytics and Reporting
When you put hadoop in this flow, you get at least 3 types of new flows
1.     OLTP DB, Other sources --> (ETL) --> Hadoop --> DW --> Analytics and Reporting
2.     OLTP DB, Other sources --> (ETL) --> Hadoop --> No SQL DB --> Online Applications
3.     OLTP DB, Other sources --> (ETL) --> Hadoop --> real-time Analytics
This can also be explained as
1.     batch processing
2.     online processing
3.     real-time processing

batch processing
This is the classic application of MR (Map-Reduce) to processing HDFS data, MR code is written in Java. Pig Latin is a language developed by yahoo to generate MR code.
Hive is a SQL like database on top of HDFS also uses MR.
Mahout is a machine learning library on top of HDFS also uses MR.
This application basically fills in the gap in existing technologies to process large datasets.

online processing
This is a scenario where a No SQL database such as HBase, Cassandra is used. No SQL databases are distributed databases unlike RDBMS hence supporting infinite scale. HBase is based on HDFS but doesn’t use MR, Cassandra can be standalone or based on HDFS. Such a data store can be used as a backend of a web applciation. web log analysis can based on such an architecture.
This application basically fills in the gap in existing technologies to store large datasets for faster online access.

real-time processing
This is the in-memory option of faster data processing using products such Spark, Storm on top of HDFS, obviously they dont use MR. Spark can be based on cassandra as well without using HDFS.
This is the most exciting application of hadoop as it really can enable many new application styles and trends.

In summary, many permutation and combination of hadoop core to create many kind of applications and mega trends.

(my apologies for not expanding a lot of acronyms and not providing links, please google on any word if you found interesting - also please leave me a comment if any questions)

Thursday, April 24, 2014

few more OSB stories

Recently we finally went-live with a OSB service which on avg, takes 11 secs (it does a lot internally) with 100K+ transactions per day. Two problems we had during this effort


1. StackOverflow Error

There were quite a few Satckoverflow errors, which turned out to be because of a x-path query below

declare function xf:escape-for-regex( $arg as xs:string? )  as xs:string {
      
   replace($arg, '(\.|\[|\]|\\|\||\-|\^|\$|\?|\*|\+|\{|\}|\(|\))','\\$1')
} ;

This function was used to retrieve a certain error message from a complex structured error response, however with some type of error responses, this went into a infinite loop  causing stackoverflow errors.

What was interesting is, weblogic server survived these errors quite gracefully without causing too much trouble with 100+ such errors happening every 10 mnts.

What's even interesting is we didn't see this error in dev environments, so it was a bit inconclusive if the error was due to a certain resource constraint or purely due to the function.

2. Runtime-configuration

One thing I struggled with OSB is to set some global parameters to be set during run time (BPEL has support for runtime global parameters), customization files don't allow any custom properties other than supported by it. Only option was to use some kind of property file and read it through Java call outs. One option we used was to use Xquery files to define these properties. But it was not possible to change these during deployment or runtime. Other option is to keep it in database table and modify through some kind of script or screen.

Monday, March 17, 2014

How I built my first mobile app



When I thought of building an app the first question was what platform to use php, java, node.js etc.. I wanted a mobile-first approach, and for mobile I picked the hybrid app approach, which is a web-app built using html/css/java script however can run as a native app.

And for backend I picked java ee as this the platform I am most comfortable with, I picked NetBeans as my ide, and the bundled Glassfish 3 container

So the steps for creating the app went something as below
Create database model and tables in mysql (mysql workbench is of great help)
used JPA to create entities (netbeans 'create entity wizard' automated the whole thing)
used REST - Jersey to create the rest services (netbeans 'create rest from entity wizard' automated the whole thing)

Creating and testing the backend was fun, I created a session bean to create my business methods using multiple JPA Entities and the calling the session bean methods from the REST service operations. Tested the rest services from a Chrome app called 'Advanced REST Client'

Then it was time for front end, netbeans helps create html5 projects with seeded Angular JS, I used that, this video helped to get a context of Angular JS, Once I built a basic UI and the Angular JS controller to call the rest services, It was time to test locally.

The first issue I got was
XMLHttpRequest cannot load Origin http://localhost:8383 is not allowed by Access-Control-Allow-Origin

This was a Big issue for me, lot of material is available on fixing CORS, however finally I got an excellent solution here , by just using this CORS filter, I didn’t had to change anything in backend all GET, POST started working fine.

Now was the time to deploy the app, I deployed my backend app on cloudbees, I really liked cloudbees run@cloud service, after installing the SDK, by just 2 commands my app was up and running

bees app:deploy -a avijeetd/myapp -t glassfish3 D:\NetBeansProjects\myapp\dist\myapp.war

bees app:bind -db myapp -a myapp -as myapp

cloudbees provide free DB (upto 5MB) and free deployment of one application.

For the mobile app, I used phonegap, phonegap build service, can convert the HTML/CSS/JS application to native applications, such as apk (android application package) file. I used github to host the source code, again free service for public repository.

So by using all these free could services, I was finally happy to see my app on an android phone. Now time to add more features to the app such as security (authentication), multi-user support in data base etc. then maybe I can tell you to download the app from the market place :-)