Page 5 of 18

The curious case of the MIA work orders?

F5 - Redirect users to a maintenance page

Working in IT, we deal with strange issues all the time. However, every once in a while, something would come up that leaves us scratching our heads for days. One such issue happened to us a few years back. It came back to me recently and this time, I thought to myself I should note it down.

Summary

  • Maximo – TechnologyOne integration error. Work orders went missing.
  • There are no traces of problem. Everything appears to be working fine
  • The problem is due to the F5 Load Balancer returns a Maintenance Page with a HTTP Code 200. This leads Maximo to think the outbound message was received successfully by WebMethods.

The mysterious missing work orders

The issue was first reported to us when the user raised a ticket about missing work orders in TechnologyOne, the Finance Management System used by our client. Without work orders created in TechOne, the users won’t be able to report actual labour time or other costs. Thus, this is considered a high-priority issue.

Web Archiving 101 | Archives and Special Collections
F5 maintenance page for integration should not have HTTP 200 OK status

Integration Background

TechOne is integrated with Maximo using WebMethods, an enterprise integration platform. Unlike direct integration, these types of problems are usually easy to deal with when an enterprise integration tool is used. We simply look at the transaction log, identify the failed transactions and what caused them, fix the issue, and then resubmit the message. All good integration tools have such fundamental capabilities.

In this case, we looked at WebMethods’ transaction history and couldn’t find any traces of the missing work orders. We also spent quite some time digging through all of the log files of each server on the cluster to find any errors but couldn’t find anything relevant. Of course, that is the case because if there is an error, it should have been picked up and the system should raise alarms and email notifications to a few overlapped monitoring channels we set up for this client.

Clueless

On the other hand, when we looked at Maximo’s Message Tracking and log files, everything looked normal with work orders published to WebMethods correctly without interruption. In other words, Maximo said it had sent the message, while WebMethods said it never received anything. This left us in limbo for a few days. And of course, when we had no clue, we did what we application people do best, we blamed the network guys.

The network team couldn’t find anything strange in their logs either. So, we let the issue slip for a few days without any real progress. During this, users kept reporting new missing work orders not knowing that I didn’t really do any troubleshooting work. I was staring at the screen mindlessly all day long.

Light at the end of the tunnel

Then of course, when you stare at something long enough, the problem will reveal itself. With enough work orders reported, it became clear that all of the updates only went missing during a period between 9 to 11 PM regardless of the type of work orders or data entered. When this pattern was mentioned, it didn’t take long for someone to point out that this is usually the time when IT do their Windows patching.

When a server is being updated, IT would set the F5 Load Balancer to re-direct any user requests to a “Site Under Maintenance” page, which makes sense for a normal user accessing a service via the browser. The problem is that when Maximo published an integration message to WebMethods, it received the same web page, which is ok, as it doesn’t process any response. However, the status of the response is HTTP 200 which is not ok in this case. Since it’s an HTTP 200 OK status, Maximo thought the message had been accepted by WebMethods and thus marked it as a successful delivery. WebMethods, on the other hand, never received such a message.

Lesson Learned

The recommendation in this case is to set the status of the Maintenance page to something other than HTTP 2xx. When Maximo receives a status other than 2xx, it marks the message as a delivery failure. This means the administrator shall be notified if monitoring is set up. The failed message will be listed as an error and can be resubmitted using the Message Reprocessing app.

Due to the complex communication chain involved, I never heard back from the F5 team on what exactly was done to rectify the issue. However, from a quick search, it looks like it can be achieved easily by updating the rule in F5.

This same issue recently came back to me, so I had to note it down to my list of common issues with Load Balancer. I think it is also fun enough to deserve a separate post. This is a lengthy story, if you made it this far, I hope at least it will be useful to you at some point.

How to deploy change for Maximo without downtime?

Downtime is costly to the business. As developers, avoiding it can give us a ton of benefits both in terms of efficiency and for personal well-being as well. For example, when making changes that require downtime to a shared environment, I have my freedom back since I don’t have to ask or wait to do it at night. 

With the introduction of Automation Script, most of the business logic and front-end changes we need to push to production nowadays can be done without downtime. Some of them are:

  • Automation Script
  • Escalation
  • Application Design
  • Conditions
  • Workflows

However, Database Configuration changes still need Admin Mode or a restart. 

In recent years, many of us have switched to DBC script to deploy changes. This approach takes more time to prepare than compared to other methods such as using Migration Manager or doing it by hand. It proves to be very reliable and allows faster deployment with much less risk. 

Then many of us probably realized that, for small changes, we can run the DBC script directly when the system is live. But after that, we will still need a quick restart. Doesn’t matter whether it’s a small environment that takes 5 minutes to restart or a massive cluster that needs 30 minutes. A restart is downtime, and any deployment that involves downtime will be treated differently with days or weeks of planning and rounds of approval and review.

For development, a colleague showed me a trick that, instead of a restart, we can just turn on and off Admin Mode. As part of this process, Maximo’s cache is refreshed and the changes will take effect. This works quite well in a few instances. However, this is still a downtime and can’t be used for Production. On a big cluster, in many cases, turning on Admin Mode takes more time than a restart.

My other colleague hinted me a different method and this is what I ended up with. I have been using this for a while now and can report that it is quite useful. Not only my productivity has improved, but it has also proven to be valuable a few times when I don’t have to approach cloud vendors to ask for downtime or restart.

The approach is very simple, when having a change that requires restart, I’ll script it using DBC. If the change is small, I can get away with using Update/Insert SQL to update directly to the configuration tables such as:

  • MAXATTRIBUTE/MAXATTRIBUTECFG
  • MAXOBJECT/MAXOBJECTCFG
  • SYNONYMDOMAIN
  • MAXLOOKUPMAP
  • Etc.

Next, I will create a super complex automation script named refreshmaxcache (with no launch point) below:

That’s it. Every time you deploy a change, all you need to do is call the API script by using the following command to refresh the configuration

https://[MAXIMO_ROOT]/maximo/oslc/script/refreshmaxcache

Note: this is not a bulletproof approach officially recommended by IBM. As such, I suggest if you use it for Production, make sure you understand the change and its impact. I will only use it for small changes in areas that have little or no risk of users writing the data while the change is being applied. For a major deployment, for example, a change to the WORKORDER table, it’s a bad idea to apply it during business hours. For non-production, I don’t see much risk involved. 

A man who doesn’t work at night is a happy person.

How to run SQL query in Maximo without database access?

With the introduction of the Maximo Application Suite, I have had to deal with more and more Maximo environments on the cloud. This often means there is no access to the backend such as the database or the Websphere/Openshift console. Sometimes, to troubleshoot issues, it is critical to be able to run queries on the database. In this post, I will introduce a new approach to accessing the database using Automation Script.

From Maximo version 7.6.0.9, we can now build custom API using automation script. This is a powerful new feature yet it looks to be underutilized by the community.

The first obvious use case is it gives us the freedom to build any API we want without being restricted by the limitations of the Maximo Integration Framework. For example, we can create an API that returns data in CSV or binary format. Or we can use it to upload data and bypass the business layer.

Since it allows us to use the browser to interact with Automation Script, and the script framework has access to all Java functions of the MBO layer, We can exploit it to execute all sorts of weird operations.

https://[MAXIMO_ROOT]/maximo/oslc/script/[SCRIPT_NAME]

In an article I posted a few days ago, I used API script to call a Java function to refresh Maximo sequence and avoid a restart. Using the same approach, we can do a database configuration and deployment without downtime

In this post, I’ll demonstrate how we can use API script to run SELECT, UPDATE, DELETE SQL statements to Maximo database without direct DB access. This can come in handy when DB access is restricted. Of course, we can use MXLoader to achieve the same result. However, this method is a lot more convenient.

Creating an API script is very simple, we just need to create a script without a launch point. Then we can call it by accessing this URL on the browser:

If you’re already logged in and have a session. That is all it takes. Otherwise, to authenticate the request, you can pass in username and password parameters like you normally do when calling the REST API.

https://[MAXIMO_ROOT]/maximo/oslc/script/[SCRIPT_NAME]?_lid=[USERNAME]&_lpwd=[PASSWORD]

To run a SELECT query on the database, I created a script named RUNSQL with the code below:

To use the script to run a query, I typed the SQL query directly in the URL in the sql parameter as below.

https://[MAXIMO_URL]/maximo/oslc/script/runsql?method=SELECT&sql=SELECT top 10 assetnum,description,status FROM asset WHERE status = 'OPERATING'

In this case, the data is returned to the browser in the CSV format.

To execute a query that does not return data (INSERT/UPDATE/DELETE):

https://[MAXIMO_URL]/maximo/oslc/script/runsql?method=DELETE&sql=DELETE maxsession WHERE userid = 'maxadmin'

Note: I have tested this against SQL Server. Haven’t got a chance to test it against DB2 and Oracle database.

How to reset sequence without restarting Maximo?

One error we often have to deal with is an incorrect sequence when adding new data to Maximo. There are many situations which can cause this issue, such as:

  • When loading data using MXLoader, or inserting data directly via SQL
  • Sequence corruption due to unknown cause in Production, probably due to errors caused by cancelled/terminated job
  • Restoring database from a copy or after an upgrade.

When this happens, the user sees an error with a duplicated key value such as

BMXAA4211E - Database error number 2601 has occurred…

The solution is well-documented and straightforward, we just need to find the current maximum ID value used in the table and update the corresponding sequence to use the next value.

For example, if the error occurs with the WORKORDERID field of the WORKORDER table, we can do this SQL update and restart Maximo.

UPDATE maxsequence SET maxreserved = (SELECT max(workorderid) + 1 FROM workorder) WHERE tbname = 'WORKORDER' and name = 'WORKORDERID'

However, I like to avoid restarting Maximo if possible due to some obvious problems such as:

  • I recently had to do a quick deployment which involves uploading some data. For some unknown reasons, loading the data via MXLoader causes random sequence corruption a few times. For this client which has a large cluster, restarting Maximo will require an additional 30-60 minutes downtime.
  • A location data hierarchy update requires me to insert a few thousand new records into the LOCANCESTOR table. I needed to update the sequence to a new value for subsequent data upload via MIF to work. Since it is a cloud environment, if I can avoid a restart, we won’t need to be dependent on the availability of the cloud provider.

To address that problem, the simplest solution I found to hot reset the sequence cache without restarting Maximo is by calling the reset sequence Java function via an automation script. The steps are as follows

  • Create a new script with no launch point:

Whenever we update maxsequence table with a new value and need to reset the cache, just execute the script by calling it via the REST API: 

[MAXIMO_URL]//maximo/oslc/script/runtask?_lid=maxadmin&_lpwd=maxadmin

If it works correctly, you should see something like below.

Executing the reset sequence script by calling it via REST API

No restart during a deployment means we can all go to bed earlier. Best of luck.

UPDATE:  On a clustered environment, I find it doesn’t seem to refresh all the JVMs. Thus, to be sure, we might need to run it on each JVM separately (by accessing the script from the JVM 908x port)

Use Maximo webservice with JSON content

While the JSON API in newer version of Maximo is quite useful, for many integration scenarios, I still prefer to use the old API infrastructure with Publish Channel and Web Service. However, the native format for this feature is XML.

To send or receive JSON with Publish Channel or Enterprise Service, we can translate the default to JSON format before it goes out / into the system. Below is a simple example to set it up.

Setup standard Publish Channel to send XML message

  • Create a new Publish Channel: 
    • Name: ZZSR
    • Object Structure: MXSR
  • Create a new Enterprise Service:
    • Name: ZZSR
    • Object Structure: MXSR
  • Create a new HTTP End Point:  ZZWEBHOOK (To keep it simple, we use webhook.site. See more details here)
  • Create a new External System:
    • Name: ZZTEST
    • End Point: ZZWEBHOOK
    • Set standard queues.
    • In Publish Channels tab, add the ZZSR channel created above.
    • Enable the channel, enable the external system, and enable event listener on the channel
  • To test and verify our Publish channel is working:
    • Create a new Service Request, then save.
    • After less than a minute, Webhook should show a new message received in XML format

To send JSON instead of XML:

  • Update publish channel ZZSR, set Processing Class with value com.ibm.tivoli.maximo.fdmbo.JSONMapperExit
  • Go to the JSON Mapping application, create a new mapping:
    • Name: ZZTEST.ZZSR.OUT (it must follow this exact format <ExtSys>.<PublishChannel>.OUT)
    • Object Structure: MXSR
    • JSON Data: {  “summary”: “Test SR”,  “details”: “Sample Details”,  “assetnum”: “11430” }
  • Save the record. Then go to Properties tab. Enter mapping as follows
  • To test the mapping works, create another SR. And check in Webhook to ensure it now receive a new message in JSON format

To receive JSON on Webservice, use similar step like above, the only key difference is, we have to name the JSON mapping as <ExtSys>.<EntService>.IN. And use a mapping like the example below

Message Engine doesn’t start after setting up cluster

This issue hit me a few times and always took me some time to figure out what happened. So I thought it’s a good idea to note it down.

Symptom:

When setting up cluster environment for Maximo, I will need to setup an integration bus with a message engine for each cluster (IF, UI, Cron etc.)

Each message engine will require its own individual schema (and thus individual user if the Oracle DB is used)

After integration bus are setup and Maximo cluster started, we see a lot of errors in the log file, usually in the Cron or MIF cluster due to message engine is not available.

When restarting the cluster, we can see that the message engine for that cluster has a “partial started” status. But a few minutes after the whole cluster is started, the message engine would show an “unavailable” status.

 

Troubleshoot:

  • Check ffdc log under [Websphere_Home] AppServerprofilesctgAppSrv01logsffdc, check for [MXServer_Name]_exception.log to see if there are any exception related to integration bus or message engine such as:
com.ibm.ws.sib.msgstore.persistence.DatasourceWrapperStoppedException com.ibm.ws.sib.msgstore.persistence.impl.PersistentMessageStoreImpl.start 1:206:1.47.1.53 D:IBMWebSphereAppServerprofilesctgAppSrv01logsffdcMAXUI-N1-4_58bec328_22.10.24_15.50.24.7956943157914483024680.txt

com.ibm.ws.sib.msgstore.PersistenceException com.ibm.ws.sib.msgstore.impl.MessageStoreImpl.start 755 D:IBMWebSphereAppServerprofilesctgAppSrv01logsffdcMAXUI-N1-4_58bec328_22.10.24_15.50.24.8271423565797649410081.txt
  • For each of the above exception, open the file it referenced. We might see some detailed error message which could help us to solve the issue.
  • In this case, I have this vague error message below:
CWSIS1501E: The data source has produced an unexpected exception: com.ibm.ws.sib.msgstore.persistence.DatasourceWrapperStoppedException: New connections cannot be provided because the persistence layer has been stopped

Solution:

  • For this specific case, it is caused by a failed setup process in previous setup which left a bunch of tables created under different message engine details. Thus, the current message engine doesn’t like to reuse it. To solve this issue, we can simply change the schema name of for the message store so that the next time the cluster starts, it will create a whole new bunch of tables for it own use.
  • If Oracle database is used, schema is linked with the DB account used for login, and is more difficult to change. Thus, I’ll just drop all the tables used by that schema. One quick way to identify and delete those tables is running this query:

select ‘drop table ‘ || table_name || ‘;’ from user_tables;

« Older posts Newer posts »