Batch Processing with Spring Batch

I have blathered on about Spring Batch a few times in the past.

The June 2009 edition of GroovyMag carried my article on how to use Spring Batch with Groovy; it’s republished (with permission) here. This was my second article for GroovyMag, I have also republished the first one, on Spring Integration with Groovy.

As I lurk on the Groovy and Grails mailing lists, I see a real need “out there” for this sort of infrastructure. Hopefully, this article will contribute to a small improvement in awareness. Writing code should, after all, be the last resort and not the first…

The source code is available, of course!

Batch Processing with Spring Batch
Dealing with Large Volumes of Data using Spring Batch and Groovy

Even though a major focus of modern ideas such as Service Oriented Architectures and Software As A Service is to facilitate and enhance live interactions between systems, batch processing remains as important and widely-used as ever: practically every significant project contains a batch processing component. Until the arrival of Spring Batch (version 1.0 was released in March, 2008), no widely available, open source, reusable architecture framework for batch processing had existed; batch processing had always been approached on an ad-hoc basis. This article will examine how Groovy joins with Spring Batch to ease the pain of dealing with large data sets.

A Brief Overview of Spring Batch

Spring Batch (SB) is a relatively new member of the Spring family of technologies. There is no better introduction than this excerpt from the documentation (see the “Learn More” section for the URL):

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications . . . Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advance [sic] enterprise services when necessary . . . Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advance technical services and features that will enable extremely high-volume and high performance batch jobs though optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

SB provides a great deal of out-of-the-box functionality: very flexible adapters for reading from flat files, facilities for dealing with JMS queues, JDBC-based database adapters (of course), along with a simple workflow ability allowing conditional, repetitive and parallel processing. Sophisticated error handling/recovery capabilities and simple job control round out the package. Following the standard Spring Framework convention of “don’t reinvent the wheel,” SB does not include a complete set of scheduling/job control tools, but works instead in conjunction with existing Spring-friendly schedulers such as Quartz and Tivoli. Over time, SB will make more use of other members of the Spring family and in particular Spring Integration, and this pairing in particular should make for a formidable partnership.

SB was seeded and driven by technology and developers from Accenture–as well as SpringSource and the general Open Source community–and so claims to represent the distillation of a fair bit of experience with “real world” needs and situations.

A Small Example Application

Because SB is rooted in the standard Spring Framework, it is quite compatible with Groovy (and is also simple to integrate into Grails). As a demonstration, I’ll implement a simple batch processing job, as follows: read and parse an input data file (where each record is formatted according to a custom multiline format); validate each record (rejecting invalid records and writing the relevant record to a dump file); apply a specific transformation to each remaining (valid) record; and finally, write the valid, transformed data into an XML file.

Figure 1 provides a simplified, high-level view of the application’s overall workflow.

Figure 1: Simplified, high-level view of the workflow in the example application

Figure 1 shows how an application initiates a SB job. The job is composed of a number of steps, which are themselves composed of a number of substeps or tasklets. Tasklets are the building blocks of the application and may read/write/validate/transform or otherwise ‘munge’ the data.

Even though this is a very simple batch-processing task, there are a number of tricky areas. Consider the input data file format (an example is shown in Listing 1): this format contains data split into logical sections (customer ID, contact information, credit card information, customer balance), with one section to a line. Each line is a CSV-formatted record but some fields may themselves contain fixed-width records. The whole record is “book-ended” by BEGIN/END markers, each of which must have the same record number. The first two lines of the file are free-form commentary.

; this is a nasty file format
; it's going to be a challenge to process!
CARD,visa:1234 1234 4321 4321:000

Listing 1: Example input data record

Processing this data file is going to be quite a challenge and it is probably worth taking some time to consider how you would tackle this task in plain Java or Groovy.

It is safe to assume that the input data will contain numerous errors that makes validation and error handling a priority consideration for this application. Validation is a common necessary chore that is not difficult but is tedious and error-prone; by relying on standard Spring technologies, SB helps simplify this task.

Another minor challenge is concerned with producing the output XML document. Listing 2 shows how the record given in should be written.

<?xml version="1.0" encoding="UTF-8"?>
  <customer sequence="0000000001">
    <number>1234 1234 4321 4321</number>

Listing 2: The resultant XML-formatted data record

As is always the case with infrastructures and frameworks, one often gets the feeling of overkill when working with a simple example such as the one in this article. Keep in mind however that as a problem gets bigger, a framework becomes more and more necessary. Remember also that SB was written with these large problem spaces in mind and so you may have difficulty seeing SB’s full potential using this one simple example application.

The Driver Application

The example application uses a small command-line application to kick off the real batch job. SB actually provides a simple command-line launcher to do this, but it is instructive to see how to deal with a SB job by hand. As Listing 3 shows, the launcher is very simple and is a typically simple Spring Framework-aware application.


import o.s.batch.core.JobParametersBuilder

public class SpringBatch {

  public static void main(String[] args) {
    def context =
      new ClassPathXmlApplicationContext(['applicationContext.xml', 'job.xml'] as String[], true)

    def jobLauncher = context.getBean('jobLauncher')

    def job = context.getBean('job')

    def adjustmentPercent = 1.01D;

    def jobExecution =,
        new JobParametersBuilder().
          addDouble("adjustment.percent", adjustmentPercent).toJobParameters())
    jobExecution.with {
      println """
Job: $jobId
StartTime: $startTime; EndTime: $endTime
Duration: ${endTime.time - startTime.time} ms
      stepExecutions.each { println "STEP: $it" }

Listing 3: The Groovy driver application

Note: throughout this article, the package name prefix ‘org.springframework’ is abbreviated to ‘o.s’ to reduce line length and aid formatting and readability.

Of interest here is the creation of the Spring application context. The application is looking for two files on its classpath: applicationContext.xml defines the boilerplate SB infrastructure and job.xml defines the SB job itself. This is a standard Spring development technique. I’ll look at these files in more detail later.

The job instance is created by the Spring application context and is looked up by the application itself; this is the real meat of the batch definition, as we shall see.

The jobLauncher instance obtained from the application context is part of SB. As the name suggests, it is concerned with mediating access to a SB job instance. In this case, it will control execution of the job instance defined in job.xml and retrieved by the application. The method returns a SB jobExecution instance that allows the application to determine the state of the associated job and its composite steps.

JobParametersBuilder provides a way of defining a map of parameters that can be passed into a job and subsequently accessed by the various steps and tasklets.

Listing 4 shows (slightly edited) the application in action.

Job: 0
StartTime: Fri May 01 16:25:50 EST 2009; EndTime: Fri May 01 16:25:54 EST 2009
Duration: 3117 ms
STEP: StepExecution: id=0, name=startupStep, status=COMPLETED,. . .
STEP: StepExecution: id=1, name=processStep, status=COMPLETED,. . .
STEP: StepExecution: id=2, name=endingStep, status=COMPLETED, exitStatus=COMP. . .

Listing 4: Output from executing the application

Boilerplate Configuration

SB requires a certain amount of standard configuration to be put in place in preparation for job execution.

Listing 5 excerpts the relevant portion of the applicationContext.xml file that contains this configuration.

<bean id="transactionManager"

<bean id="jobRepository"

<bean id="jobLauncher"
      p:jobRepository-ref="jobRepository" />

Listing 5: Minimal-functionality SB configuration

This is the least sophisticated SB configuration possible. This configuration establishes a no-op transaction manager and a pure in-memory jobRepository. The latter configuration option means that no distribution, persistence or job restart capabilities are available. For the purposes of this application, this configuration is sufficient.

In this example application, all processing is sequential and synchronous with the application; however, it is possible to configure the jobLauncher instance to execute a job asynchronously to the application. An asynchronous configuration would be appropriate if using SB in conjunction with a Grails/AJAX application, which could initiate a job and then poll for status and update a visual progress indicator until the job completes.

The Job Definition

The keystone of this application is actually the job.xml application context file. Because this is quite long, I will go through it in sections.

Note: The listings shown here have been edited and excerpted to save space (while hopefully remaining clear).

The full source code for this example is supplied with this edition of GroovyMag, of course.

Listing 6 shows the place where it all begins: the job definition.

<batch:job id="job">
  <batch:step id="startupStep" next="processStep">
    <batch:tasklet ref="logStartupMessage"/>
  <batch:step id="processStep" next="endingStep">
      <batch:chunk skip-limit="100000"
          <batch:stream ref="errorItemWriter"/>
        <batch:listener ref="skipListener"/>
  <batch:step id="endingStep">
    <batch:tasklet ref="logEndingMessage"/>

Listing 6: The job definition

This definition constructs a three-stage processing pipeline.While the first and last steps merely print a status message, the middle step (with id=processStep) is the most important and what I focus on here. The processStep step identifies the various input and output processors and also defines the intermediate transformations/processes that will be executed on each record.

An important SB concept introduced here is that of a chunk. A chunk defines the processing that is to be done within a transaction boundary: a batch of records that may be written/rolled-back as a whole, for instance. For this application, each record is treated as constituting a separate chunk and so error handling, etc., is done on a per-record basis.

The batch:streams, batch:listeners and batch:skippable-exception-classes configuration elements are all related to the way that erroneous input records are handled. This will be looked at later.

Processing Step One: Input

Listing 7 defines the itemReader bean (and some of the necessary associated configuration), which deals with reading and parsing the multiline record from the input data file.

<bean id="rawDataResource" class="">
  <constructor-arg value="resource/data/inputdata.dat"/>

<bean id="itemReader" class="">
  <property name="flatFileItemReaderDelegate">
    <bean class="o.s.batch.item.file.FlatFileItemReader"
      <property name="lineMapper">
        <bean class="o.s.batch.item.file.mapping.DefaultLineMapper"
          <property name="fieldSetMapper">
            <bean class="o.s.batch.item.file.mapping.PassThroughFieldSetMapper"/>

<bean id="multilineFileTokenizer"
  <property name="tokenizers">
      <entry key="BEGIN*" value-ref="beginLineTokenizer"/>
      <entry key="CUST*" value-ref="customerLineTokenizer"/>
      <entry key="CONT*" value-ref="contactLineTokenizer"/>
      <entry key="CARD*" value-ref="cardLineTokenizer"/>
      <entry key="BAL*" value-ref="balanceLineTokenizer"/>
      <entry key="END*" value-ref="endLineTokenizer"/>

<bean id="csvLineTokenizer"

<bean id="fixedLineTokenizer"

<bean id="beginLineTokenizer"

<bean id="endLineTokenizer"

<bean id="customerLineTokenizer"

Listing 7: Input record handling

Take your time to read through this code; the itemReader is quite a sophisticated piece of infrastructure.
As you piece things together, you will see how almost everything is delegated to standard SB classes: the actual reading of the file and skipping the comment lines is deferred to a FlatFileItemReader, and the recognition and handling of the various logical sections in a record are handled by the PatternMatchingCompositeLineTokenizer class (which itself defers to a number of line tokenizers). In fact, the only custom activity here is the mapping of the parsed data to a simple application-specific class via the class. A tremendous amount of processing is being performed here with very little development effort.

Note how the use of abstract parent definitions (e.g., fixedLineTokenizer), makes it possible to write clearer configurations for several elements (e.g., beginLineTokenizer/endLineTokenizer). This is a standard Spring technique that helps to keep the configuration file DRY (i.e., without unnecessary repetition).

Again, consider how much effort would be involved if you had to do all this by hand.

Listing 8 and Listing 9 shows the MultilineRecord and MultilineRecordReader classes.


public class MultilineRecord {
  String sequence
  String endSequence

  Long id

  String mobile
  String landline
  String email

  String provider
  String number
  String security

  BigDecimal balance

  @Override public String toString() { …elided… }

Listing 8: The MultilineRecord class


import …elided…

public class MultilineRecordReader implements
  ItemReader<MultilineRecord>, ItemStream {
  private FlatFileItemReader<FieldSet> flatFileItemReaderDelegate

  public MultilineRecord read() throws Exception
    MultilineRecord mlr = null

    // flags to indicate the presence of component lines
    def customerSeen = false;
    def contactsSeen = false
    def ccSeen = false
    def balSeen = false

    def line
    while (line =
      String prefix = line.readString(0);
      switch (prefix)
        case 'BEGIN':
          mlr = new MultilineRecord(sequence: line.readString(1))

          Assert.notNull(mlr, "MultilineRecord not yet intialised")
          switch (prefix)
            case 'CUST':
     = line.readLong(1)
              customerSeen = true

            case 'CONT':
              mlr.with {
                mobile = line.readString(1)
                landline = line.readString(2)
                email = line.readString(3)
              contactsSeen = true

            case 'CARD':
              mlr.with {
                provider = line.readString(1)
                number = line.readString(2)
                security = line.readString(3)
              ccSeen = true

            case 'BAL':
              mlr.balance = line.readBigDecimal(1)
              balSeen = true

            case 'END':
              // check all record fields seen
              Assert.isTrue(mlr && customerSeen &&
                            contactsSeen && ccSeen && balSeen,
                  "Incomplete Record Found")
              mlr.endSequence = line.readString(1)
              return mlr

  … elided …

Listing 9: The MultilineRecordReader class

The MultilineRecordReader class is responsible for allocating the various fields of each tokenized line (offered as SB FieldSets) to a single new instance of a MultilineRecord. It also performs some minor validation to ensure that there are no missing logical sections. Note how Groovy’s versatile switch statement and the use of Groovy’s enhanced Object with method makes the processing much clearer than it would be in plain Java.

Processing Step Two: Output

As every first-year CompSci student learns, after input comes processing, followed by output. Naturally, I am not going to follow this sequence! In an attempt to achieve a less convoluted narrative, I’ll now look at output processing (the third stage of the processStep step). Listing 10 shows the requisite configuration for the itemWriter.

<bean id="itemWriter"
      p:marshaller-ref="multiLineRecordMarshaller" p:rootTagName="customers"

<bean id="multiLineRecordMarshaller" class="o.s.oxm.xstream.XStreamMarshaller">
  <property name="useAttributeFor">
      <entry key="sequence">
        <value type="java.lang.Class">java.lang.String</value>
  <property name="omittedFields">
      <entry key="" value="endSequence"/>
  <property name="aliases">
      <entry key="customer"

<bean id="processedOutputResource" class="">
  <constructor-arg value="resource/data/job-output.xml"/>

Listing 10: Output XML processing

The itemWriter configuration is quite straightforward. Output handling is actually delegated to a marshaller instance. In this case, the Spring OXM project is brought to bear to simplify XML generation. The XStreamMarshaller is only very minimally configurable (but quite performant…the age-old tradeoff): it is possible to render the sequence field as an XML attribute and not to render the endSequence field at all, but that is pretty much all the configuration possible.

Processing Step Three: Validation and Transformation

Now that we’ve seen the mechanics of writing a file, it is time to move on to (or more accurately: back to) the middle processing step.

Listing 11 shows the configuration of the processing tasks.

<bean id="compositeItemProcessor"
  <property name="itemProcessors">
      <ref local="validatingItemProcessor"/>
      <ref local="embiggenProcessor"/>

<bean id="validatingItemProcessor"
  <constructor-arg ref="validator"/>

<bean id="validator"
  <property name="validator">
    <bean id="luhnValidator"
      <property name="customFunctions">
          <entry key="luhn" value=""/>
      <property name="valang">
{ id : ? > 0 :
  'id field must be a natural number' }
{ endSequence : ? is not blank :
  'end sequence number field missing' }
{ sequence : ? is not blank :
  'begin sequence number field missing' }
{ endSequence : endSequence == sequence :
  'mismatched begin/end sequence numbers' }
{ mobile : match('\\d{10}',?) == true :
  'mobile field must be 10 digits' }
{ landline : match('\\d{8,10}',?) == true :
  'landline field must be 8-10 digits' }
{ provider : ? in 'amex', 'visa', 'macd' :
  'card provider field should be one of "visa", "amex" or "macd"' }
{ number : match('\\d{4}[ ]\\d{4}[ ]\\d{4}[ ]\\d{4}',?) == true :
  'card number field must match the format "xxxx xxxx xxxx xxxx"' }
{ number : luhn(?) == true :
  'card number luhn check failed' }
{ security : match('\\d{3}',?) == true :
  'card number field must be 3 digits' }
{ email : ? is not blank :
  'email field missing' }
{ email : email(?) == true :
  'email field is not a valid email address' }
{ balance : ? is not blank :
  'balance field missing' }

<bean id="errorOutputResource"
  <constructor-arg value="resource/data/errors.txt"/>

<bean id="passthroughLineAggregator"

<bean id="errorItemWriter"

<bean id="skipListener" class=""

<bean id="embiggenProcessor"

Listing 11: Configuration of processing tasks

For this application, the middle processing part of the pipeline is itself a two-stage composite process: first comes validation (and possibly rejection of invalid data), followed by transformation. This is configured through the compositeItemProcessor bean (which is referenced by the processStep step in ).
SB allows the use of any of the available Spring-compatible validation systems. For this example I have chosen to use the Valang Spring module, because of its clear declarative nature.

The validator bean contains a plaintext valang property which defines a series of expressions that should be evaluated against the various properties of the bean to which it is applied. For example:

{ id : ? > 0 : 'id field must be a natural number' }

If the given expression evaluates to false, validation fails and the associated message is added to the list of errors being maintained by the validator bean.

While Valang provides a series of standard functions (such as email, which checks to ensure that a field contains a valid email address), it cannot account for all possible requirements. It can be augmented via the customFunctions property, however, and it is this ability that allows me to define an application-specific function. To illustrate this, I’ll introduce a check for the validity of the credit card number field (using the so-called Luhn function; see “Learn More” for a reference to how this works), as is shown in Listing 12.


import …elided…

public class LuhnFunction extends AbstractFunction {

  public LuhnFunction(Function[] arg0, int arg1, int arg2) {
    super(arg0, arg1, arg2);

  protected Object doGetResult(Object target) throws Exception {
    def str = getArguments()[0].getResult(target).toString()

  public static boolean isValid(String cardNumber) {
    def sum = 0
    def addend = 0
    def timesTwo = false

    cardNumber.replaceAll(' ', '').each {dc ->
      def digit = Integer.valueOf(dc)
      if (timesTwo)
        addend = digit * 2
        if (addend > 9)
          addend -= 9;
        addend = digit
      sum += addend
      timesTwo = !timesTwo

    (sum % 10) == 0

Listing 12: Luhn function class

This allows the use of the luhn() function as if it were a built-in Valang function:

{ number : luhn(?) == true : 'card number luhn check failed' }

Valang is a powerful and effective validation framework. Its most important feature is probably that since it uses “near natural language” configuration, an application’s validation rules can be reviewed and changed by any appropriate product owner or business representative and this can make for a higher quality product. Valang is a standalone framework and well worth further study (see the “Learn More” section for the URL).

Refer back to Listing 6. You will see that in the event of a ValidationException, processing of a chunk is skipped. This application registers a listener for this situation that simply writes the offending record to a configured itemWriter. Listing 13 shows the appropriate class.


import org.springframework.batch.core.listener.SkipListenerSupport

public class SkipListener extends SkipListenerSupport<MultilineRecord, Object> {

  def writer

  public void onSkipInProcess(MultilineRecord item, Throwable t) {
    writer.write ( [ item ] )

Listing 13: Error handling skip listener class

The configured itemWriter has an associated line aggregator that performs preprocessing before the actual write takes place. In this case, the PassThroughLineAggregator class simply performs a toString operation on the presented item, as Listing 14 shows.


import org.springframework.batch.item.file.transform.LineAggregator

public class PassThroughLineAggregator implements LineAggregator<MultilineRecord> {
  public String aggregate(MultilineRecord item) {

Listing 14: PassThroughLineAggregator class

The second part of the composite processing step shown in deals with transforming the now-valid data record. This particular transformation represents a business rule: as each record is transformed, its balance should be adjusted by a certain percentage. This requires a small piece of Groovy, the EmbiggenProcessor. Listing 15 shows this simple class in its entirety.


import org.springframework.batch.item.ItemProcessor

public class EmbiggenProcessor
  implements ItemProcessor<MultilineRecord, MultilineRecord> {

  def percent

  public MultilineRecord process(MultilineRecord mlr) throws Exception {
    mlr.balance *= percent


Listing 15: The EmbiggenProcessor class

If you refer back to Listing 3 and Listing 11, you will see how the percent value is injected into the EmbiggenProcessor class from the application via the job.xml file.


To paraphrase Benjamin Disraeli, first Earl of Beaconsfield: “there are lies, damn lies and performance measurements.” Since many batch jobs deal with very large data sets, the performance of SB is bound to be of paramount interest to some. I am a coward! I am going to merely dip my toe into this potentially turbulent topic and just let you know that on my laptop, the complete processing of 42,500 records took 92,381 ms. That makes about 2.17 ms per record. Not too shabby in my opinion–but of course, your mileage may vary.

Wrapping Up

You’ve now walked through a complete SB application. A large proportion of the application is declarative configuration. I like this style of working: it reduces the amount of ‘real’ coding that is required and thus minimizes the opportunity for error. For those parts of the problem space that are not directly covered by the standard components, Groovy has proved to be a very effective tool; with minimal effort and coding, Groovy has allowed me to very effectively concentrate on creating a clear solution, which also minimizes the opportunity to introduce bugs.

I continue to find it impressive that Groovy—a dynamic language—can still effectively work with highly-typed interfaces such as those exposed by SB: checked exceptions, strongly typed parameter lists, and even generic classes can be handled with ease. At the same time, Groovy can let me work with frameworks such as Valang without forcing me to deal with ‘nasties’ such as adapter classes, proxies, out-of-process adapters, etc. This means that an inveterate Java programmer like myself can continue to apply his existing skillset—with its hard-won collection of lessons learned, tricks and tips picked up over time—while also taking advantage of the productivity and ease of use of Groovy’s modern dynamic language. In my not-so-humble opinion, this is important and will surely contribute to a rapid increase in Groovy’s popularity.

Learn More

Spring Batch
Spring Valang module
Spring OXM
The Luhn Function

Tags: Groovy, GroovyMag, Programming, Tools

MarkupBuilder, How Do I love Thee…

In oh, so many ways!

Groovy’s MarkupBuilder class can really clean up your code.

import groovy.xml.*

def out = new StringWriter()
def b = new MarkupBuilder(out)
b.table(id: 'test', border: 0) {
  tbody {
    tr {
      td "hello"
      td(id: 'hello_id', 'hello')
      td(id: 'test', /hello/)
      td(id: 'esc') { mkp.yield '&nbsp;' }
      td(id: 'unesc') { mkp.yieldUnescaped '&nbsp;' }
      td(id: 999) { mkp.yield 999 }
      td(id: 99) { mkp.yieldUnescaped 99 }
      td(id: 88) { mkp.yieldUnescaped 'hello' }
      td(id: 77) { mkp.yieldUnescaped "hello2" }
      td(id: 66) { mkp.yieldUnescaped """hello3""" }
      // gives compilation error "unexpected token: } at line: 19, column: 48":
      //   td(id: 55) { mkp.yieldUnescaped /hello4/ }
      td(id: 55) { mkp.yieldUnescaped "" + /hello4/ }

println out.toString()

This produces:

<table id='test' border='0'>
      <td id='hello_id'>hello</td>
      <td id='test'>hello</td>
      <td id='esc'>&nbsp;</td>
      <td id='unesc'> </td>
      <td id='999'>999</td>
      <td id='99'>99</td>
      <td id='88'>hello</td>
      <td id='77'>hello2</td>
      <td id='66'>hello3</td>
      <td id='55'>hello4</td>

This was just another of those “external brain dump” posts…I didn’t want to loose my memory and there are a few subtle points (or at least a few things that I found out by trial-and-error). There’s not too much other stuff around.

Tags: Groovy, Programming

Solving the Enterprise Integration Puzzle with Spring Integration

Back in May 2009, I GroovyMag.

GroovyMag is very nice in that it allows the author to republish their work after a ‘decent’ interval has passed (either 60 or 90 days, can’t quite remember at the moment). That period is well and truly passed for this article so here it is in its full resplendency. Enjoy!

Forgot to make the source available when I first republished this article. Silly me. All fixed now!


Solving the Enterprise Integration Puzzle
Getting Started with Spring Integration

Groovy is frequently promoted as “the scripting language for the JVM.” Grails is often described as being “great for smaller, quick-to-build web applications.” Neither of these pieces of perceived wisdom really gives the Groovy/Grails combination its due: a powerful tool in the Systems Integrator’s armory. This article will show how Groovy and Grails–in conjunction with SpringSource’s up-and-coming “Spring Integration” project–can make complex systems integration tasks as easy as building a standalone web application.

A Brief Overview of Spring Integration

Systems integration is frequently likened to attempting to piece together a badly made jigsaw puzzle. The end goal is often quite fuzzy; there may be several potential solutions to consider; there may be many conflicting approaches to evaluate; and when all is said and done, sometimes all that can be done is to say “To heck with the Grand Plan!” and ‘adapt’ the puzzle using a pair of nice sharp scissors.

Spring Integration (SI) is a recent project within the Spring stable that provides numerous tools to make the systems integration task less frustrating. SI builds on several existing Spring stablemates, but provides a unified way of configuring these components and linking them together. According to SI, all message endpoints (which act with and on behalf of application components) interact via messages sent over various flavors of messaging channels, perhaps undergoing various transformations along the way. In addition, external resources (files, JMS-based queueing systems, SOAP-based WebServices, etc.)  interact with these channels via simple-to-configure adapters.

SI imposes very little onto the developer; almost everything in the system can be implemented using simple POG/JOs (Plain Old Groovy/Java Objects) and configuration is typically via XML or Annotations.

SI provides out-of-the-box support for many of the patterns defined in Hohpe and Woolfe’s Enterprise Integration Patterns book and associated website and–as this article will show–is quite “Groovy friendly.”

A Small Example Application

In this article, I am going to take a very simple systems integration task and build a small example application. The application is a small Grails application that performs a simple function: given a customer number, it calculates the shipping cost for a single widget.

To show how Spring Integration can be used, I am assuming that a customer will enter their customer ID into a Grails-originated web page, which is then routed to one instance of a set of JMS servers. The associated address data is then retrieved, the city field extracted and then passed on to a SOAP-based WebService that calculates the cost of shipping widgets from Brisbane to the customer’s home city. To add a smidgen of realism, I have built the various services to be completely unrelated; in particular, they do not reference any form of standardized messaging schema. The integration pipeline needs to handle various simple message types, including strings, CSV-formatted data records, XML-formatted strings and floats.

Figure 1 provides a high-level view of the application.


Figure 1: The processing flow and major components comprising the example application

The various steps highlighted in this flow are:

  1. The customer ID (a simple string) is entered into the Grails web application and passed off to a generic SI gateway. The gateway places the customer ID on an internal channel that leads to a message router.
  2. The router retrieves the message from its inbound channel and determines which available channel to use to ensure that the message is handled by the appropriate JMS server instance.
  3. The message is then handled by a SI JMS “channel adapter,” which deals with the mechanics of working with the underlying external JMS server.
  4. The nominated JMS server instance produces a record (a CSV-formatted string) that is placed onto a shared reply channel.
  5. The CSV record is removed from the shared channel and transformed. The result of the transformation is an XML-formatted string which is again placed on a ‘downstream’ queue.
  6. The WebService channel adaptor removes the string from its request channel and invokes the external SOAP WebService for further processing.
  7. The WebService response (a single float) is then passed back to the originating messaging gateway, which returns it to the Grails controller.

This is a very simple system and it may seem that this is overkill. There is a payoff, however: SI makes it easy to plug in disparate (often legacy) systems and thus produces an expandable framework that is capable of being built out to cover future, more complex, situations.

The Pieces of the Jigsaw Puzzle

Before starting out on the integration exercise proper, it is worth taking a quick look at the various pieces of the jigsaw puzzle that are going to be joined together.

You will quickly see that all the pieces are exceptionally simplistic. I haven’t even tried to make them correspond to a multitude of real-life issues such as error handling, logging, configuration, etc., which have been almost completely ignored. This is by design: for the purposes of this article, I want to focus mostly on SI, not on the various services.

The Grails Web Application

Figure 2 shows the Grails web interface “in action.” It is very simple, with only a single form and a single results page.


Figure 2: The deployed Grails web application

The ‘meat’ of the application (as far as this article is concerned) lies in the single Grails controller class shown in Listing 1.

public class SpringIntegrationController {

  def siGateway

  def index = { }

  def submit = {
    long start = System.currentTimeMillis()
    def msg = params.custid
    def got = siGateway.receive()
    flash.message =
      "Query took: ${System.currentTimeMillis() - start} millis."
    [shippingCost: got]

Listing 1: The Grails controller class

There are two points of interest in this simple class.

The siGateway instance injected from the Spring application context constitutes the sole point of contact between the Grails web application and the SI framework. (In theory, SI allows for an even less intrusive interface than this, based around dynamically constructed proxies, but a bug surfaced as I started writing the application…no matter, the hard way is not too onerous to use and I am sure that the SI guys will soon squash the bug.)

The submit closure is responsible for handling the POSTed form data and for driving the injected siGateway instance through its paces. It is pretty straightforward: all the action is happening over in “SI-land.”

The JMS Application

For the example application, I have written a very simple JMS service (using Apache ActiveMQ as the messaging provider), as shown in Listing 2.

import javax.jms.Session
import org.apache.activemq.ActiveMQConnectionFactory

public class AMQService
  private static factory =
	  new ActiveMQConnectionFactory("tcp://localhost:61616")

  public static void main(String[] args) throws Exception
    def cKey = args[0]
    def cFile = args[1]

    println 'Starting...'
    println "Config key: $cKey"
    println "Config file: $cFile"

    def config =
      new ConfigSlurper(cKey).parse(new File(cFile).toURL())

    def database = config.database
    println "Database: $database"

    def inQ = config.inQ
    println "Incoming Queue: $inQ"

    def replyTopic = config.replyTopic
    println "Reply Topic: $replyTopic"

    def qConn = factory.createQueueConnection()
    def qSession =
      qConn.createQueueSession(false, Session.AUTO_ACKNOWLEDGE)
    def consumer =

    def tConn = factory.createTopicConnection()
    def tSession =
      qConn.createTopicSession(false, Session.AUTO_ACKNOWLEDGE)
    def producer =


    for ( ; ; )

  private static process(database, replyMessage, consumer, producer)
      def textMessage = consumer.receive()  // indefinite blocking
      replyMessage.with {
        setText(getResponse(database, textMessage.getText()))
    catch (Throwable t)

  private static getResponse(database, inMsg)
    def resp = database[inMsg]
    println "Request: '$inMsg'; Response: '$resp'"

Listing 2: The JMS server

This is a standard JMS service; no need to elaborate.

One point to note (that is not directly SI-related, but of interest nonetheless) is that the service is written to allow multiple instances to run at the same time, dealing with a partitioned data space. For the purposes of the example application, I have configured two instances. One instance is prepared to deal with customer IDs whose first character is in the range a-l and the second will deal with the rest of the alphabet.

The service accepts input via a uniquely-designated input queue, but sends its response to a shared topic. This structure simplifies things by ensuring that any downstream component merely has to handle a single publish/subscribe channel (which can carry data from multiple sources), rather than trying to deal with a multitude of point-to-point links.

Each server instance is driven by a pair of application parameters, defining:

  • The location of the configuration file to use
  • The appropriate environment section in the designated configuration file

The configuration file supplies:

  • The name of the input queue to use
  • The name of the output topic to receive the responses
  • The actual static database to use (I did warn that these applications were extremely simple!)

The service makes use of ConfigSlurper’s ability to deal with different environment sections and so I can define a single configuration file (see Listing 3) to configure the different instances uniquely.



environments {
  al {
    database=['b9876':'123 Nowhere St,Brisbane',
              'f0234':'42 Imaginary Place,Darwin']
  mz {
    database=['x5555':'123 Fake St,Hobart',
              'w8888':'000 Talkfest Lane,Canberra']

Listing 3: JMS server configuration file

This service accepts a string, but returns a CSV-formatted record.

The WebService

The final part in the jigsaw puzzle is a SOAP WebService built on top of the facilities provided by GroovyWS.

The service’s raison-d’être is to retrieve the cost of shipping n widgets between major cities in Australia; see Listing 4 and Listing 5.

public class ShippingCostWebService {

  static costs = [
    Brisbane: [ Brisbane:0.0F,
                Hobart:88.88F ],
    Darwin: [ Brisbane:44.44F,
              Canberra: 33.33F,
              Hobart: 99.99F ],
    Canberra: [ Brisbane:22.22F,
                Canberra: 0.0F,
                Hobart: 77.77F ],
    Hobart: [ Brisbane:88.88F,
              Canberra: 77.77F,
              Hobart: 0.0F ],

  Float calculateShippingCost(String fromLoc,
                              String toLoc,
                              Integer nItem) {
    def cost = costs[fromLoc][toLoc] * nItem
    println "calculateShippingCost($fromLoc,$toLoc,$nItem) => $cost"

Listing 4: The shipping cost WebService

println 'Starting...'

def ws = new



println '...Started.'

Listing 5: The GroovyWS WebService server harness

I can’t lie! This is just an old WebService that I had lying around that I created for a conference presentation, and which I have adopted for this application. It doesn’t have a customized API and isn’t necessarily a perfect fit to the task. In the systems integration world, improvisation is often required to squeeze data out of the numerous ‘sub-optimal’ nooks and crannies that are found all over an enterprise.

Joining the Pieces Together

Now that you are familiar with the basic componentry, it is time to look at SI proper.

As with any Spring project, SI is configured separately to the actual code in an application. Since this application is based on Grails, I have chosen to use the nice resources.groovy Grails Spring Beans DSL configuration file, rather than the ‘traditional’ XML-based configuration.

Listing 6 shows the complete configuration.

beans = {
  xmlns si:""
  xmlns jms:""
  xmlns stream:""
  xmlns ws:""

  // SI componentry and plumbing
  si {
    poller(default: true) {
      "interval-trigger"(interval: 1, "time-unit": "SECONDS")
    router("input-channel": "routerChannel",
           ref: "highLowRouter",
           method: "route")
    transformer("input-channel": "InboundTopic",
                "output-channel": "TransformedInbound",
                ref: "csvStringTransformer",
                method: "transform")
    channel(id: "routerChannel")
    channel(id: "ALOutboundChannel")
    channel(id: "MZOutboundChannel")
    channel(id: "RoutingRejectChannel") {
      queue(capacity: "256")
    channel(id: "TransformedInbound") {
      queue(capacity: "16")
    channel(id: "InboundTopic") {
      queue(capacity: "16")
    channel(id: "ShippingCostChannel",
            dataType: "java.lang.Float") {
      queue(capacity: "16")

  stream {
                             "append-newline": true)

  // WebService stuff
  ws {
    "outbound-gateway"("request-channel": "TransformedInbound",
                       "reply-channel": "ShippingCostChannel",
                        uri: "http://localhost:6980/ShippingCostWebService")

  // AMQ/JMS stuff
  connectionFactory(org.apache.activemq.pool.PooledConnectionFactory) { bean ->
    bean.destroyMethod = "stop"
    connectionFactory = { org.apache.activemq.ActiveMQConnectionFactory cf ->
      brokerURL = "tcp://localhost:61616"

  "AL.REQUEST"(org.apache.activemq.command.ActiveMQQueue, "SI.AL.REQUEST")
  "MZ.REQUEST"(org.apache.activemq.command.ActiveMQQueue, "SI.MZ.REQUEST")
  "REPLY"(org.apache.activemq.command.ActiveMQTopic, "SI.REPLY.TOPIC")

  jms {
    "outbound-channel-adapter"(channel: "ALOutboundChannel",
                               destination: "AL.REQUEST")

    "outbound-channel-adapter"(channel: "MZOutboundChannel",
                               destination: "MZ.REQUEST")

    "message-driven-channel-adapter"(channel: "InboundTopic",
                                     destination: "REPLY")

  // general componentry
  siGateway(org.springframework.integration.gateway.SimpleMessagingGateway) {
    requestChannel = ref("routerChannel")
    replyChannel = ref("ShippingCostChannel")
    replyTimeout = "10000"


  csvStringTransformer( {
    homeBase = 'Brisbane'

Listing 6: The Spring Integration configuration

As you read the following discussion, you may find it useful to refer back to Figure 1.

The configuration is structured into various sections, according to the needs of SI and of the various resources that are being integrated.

The si section (more precisely, those elements of SI configured via the XML namespace allocated the si prefix in this document) configures the various aspects of SI itself. This includes establishing the various channels that interlink the components, defining a few application-specific components (such as the router and transformer), and putting in place a default polling schedule for those parts of SI that need to poll.

It is worth examining the router and transformer components here. I have mentioned that the JMS services deal with a partitioned data space and that traffic is directed to an individual service based on the actual data message being processed. SI allows for the definition of an application-specific router to perform this type of task, in this case the method route of the highLowRouter instance. Listing 7 shows the code for the router.


public class LookupRouter {

  public String route(String msg) {
    switch(msg) {
      case ~/(?i:[a-l].*)/: return 'ALOutboundChannel'
      case ~/(?i:[m-z].*)/: return 'MZOutboundChannel'
      default: return 'reject'

Listing 7: The router class

The router examines the incoming message data and determines the appropriate outbound path for the message. Groovy’s wonderfully versatile switch statement, combined with the ability to do case-independent pattern matching, makes for beautifully minimalistic code.

The transformer is equally simple, as Listing 8 shows.


public class CsvTransformer {
  def homeBase

  public String transform(String csv) {
    def dest = csv.tokenize(',')[1]


Listing 8: The transformer class

This code is presented with an input CSV-formatted record, extracts the relevant field, and passes an XML-formatted string onwards.

Recall that the WebService’s API is not custom-built for the purpose of this application; for this use, arg2 (the nItems parameter) is always fixed at 1.

The stream section of the configuration merely allocates a generic channel listener so that the standard errorChannel is not silently ignored; this configuration will ensure that errors/exceptions, etc., are logged.

The ws section deals–unsurprisingly–with WebService integration. The section configures an SI outbound gateway that is capable of receiving a request on a specified channel, invoking the configured WebService and finally sending the result to the appropriate response channel. All this is specified declaratively: the actual mechanism of dealing with the WebService (retrieving WSDL documents, generating proxies and handling XML-based request/response messages) is completely hidden away.

The remaining configuration concerns SI’s JMS adapter and ActiveMQ integration.

The siGateway has been discussed previously.

By default, all SI’s JMS adapters reference a bean with the standard name of connectionFactory. It is the responsibility of this bean to ‘vend’ connections to the external JMS system. In this case, a connection pool to a local Apache ActiveMQ server running at the URL “tcp://localhost:61616″ is being created. Since the external connections are pooled for efficiency, by specifying destroyMethod=’stop’ we ensure that they are simply inactivated after each (re)use and are never actually discarded.

Note: I found the actual syntax used to declare the connectionFactory a little tricky. I had to ask for help on the excellent mailing list to get it right! Thanks to the generous souls who pointed me in the right direction.

Following the definition of the the connectionFactory, we create two instances of ActiveMQ queues and an instance of an ActiveMQ topic. It is these that carry the messages to/from the external JMS resources.

The jms section configures two JMS-specific outbound channel adapters and one inbound channel adapter. As the names suggest, these are concerned with driving (or being driven by) the ActiveMQ instances declared above. The configured message-driven-channel-adapter instance defines a component that will be asynchronously “actively invoked” to handle an incoming message. It is possible to configure a polling-oriented component, but this is generally less flexible, and may waste CPU, and increase the latency in the system.

As with the WebService gateway, everything is declarative.

Admiring the Finished Puzzle

SI has enabled a deceptively simple solution to a fairly complex problem. There are no messy tracts of confusing mechanistic code; the system is essentially defined all in a single configuration; the individual components are simple and (generally) reusable; and the groundwork for coping with change set with very little effort.

SI is a simple but powerful toolkit; the example application I have discussed here has really just touched the surface of what is possible. SI provides much more, including facilities for secure channels, inbound/outbound mail handling, event handling, RMI integration and much better XML handling than I have touched upon here.

Remember that these are still early days for SI: for example, there are no debuggers or GUI editors as found in the big “name brand” systems integration tools. In time these tools may come but for now remember that SI is orders of magnitude simpler, and the SI/Groovy/Grails triumvirate makes the developer many times more productive than some other tools I have used.

I hope that I have stimulated your interest in SI and shown that a Grails application need not be limited to serving up “little web apps” but can actually service the needs of larger-scale systems integration tasks.

I hope that I have also been able to show that when the time comes to get out the scissors, Groovy is more than equal to the task of “slicing and dicing” until the pieces of the systems integration jigsaw puzzle fit nicely together.

Learn More

Bob Brown is the director and owner of Transentia Pty. Ltd.. Based in beautiful Brisbane, Australia, Bob is a specialist in Enterprise Java and has found his niche identifying and applying leading-edge technologies and techniques to customers problems.

Watch out for my other Groovymag articles…coming soon to a browser near you.

Tags: Grails, Groovy, Programming, SOA, Tools

About Bloody Time!

Microsoft Joins W3C SVG Working Group.

Maybe it means something good is around the corner, maybe not. One can but hope…

Tags: Tools

Happy Birthday Transentia!


Transentia is Ten years old today.

Transentia is a baby of the dot-com era. We started out wrangling Java and are coming into the second decade having adopted Groovy as a technology platform.

This is a good time to reflect on the origin of the name…

It all started while I was a member of the “Business Development” team at DSTC.

After a session wrestling with a group of research staff (collective noun: a recalcitrant?) a colleague (who shall remain nameless) stormed into my office and yelled “I don’t know what we are doing, but it sure as hell isn’t technology transfer.”

He came up with the phrase “Technology Transferention”, explaining: “It sounds good and can impress people who don’t know any better, but it doesn’t actually mean a darned thing. Exactly right for what we do.” That phrase got a fair bit of mileage :-)

While casting around for a company name I recalled this event.

I decided that “Transferention Technologies” was a bit too much of a mouthful and munged it around a bit; thus the name Transentia was born.

(The alternative was a play on another phrase the team used when confronted with an absurd situation: “the sky is green, the trees are blue.” That got a lot of airplay too, but I couldn’t quite work it into a good company name: “Green Sky Technologies” sounds a bit too vomit-induc{ed/ing}.)

Nowadays, there’s a “Land of Transentia” in some online game, a (Japanese?) musician, and a cybersquatter is sitting on ‘’ (and can continue to sit, as far as I am concerned. I deliberately chose an australian domain…we are a much more exclusive club :-))

Tags: Retrospectives

Font Squirrel

Apparently: “Free fonts have met their match.”

I found the site as I was reading the page describing the new Web Open Font Format for Firefox 3.6.

Worth remembering.

Tags: Tools

Microsoft Calendar Printing Assistant

My wife is always asking me to print calendar ‘blanks’ for her.

The Calendar Printing Assistant for Outlook 2007 is a nice freebie from Microsoft to do just that (and more, of course).

There’s loads of styles to choose from and it can even print year calendars:


This makes up for the strange oversight in Outlook proper.

It’s really quite grouse!

Tags: Tools

Truth In Advertising!

Well I never!

This was broadcast on 31 Dec, 2009 at 21:52 on (Brisbane, Australia) Channel 7 Digital free-to-air T.V.:


I am tickled pink! It’s not often that one sees a piece of quack medicine forced to identify itself as such.

In case you can’t read it clearly, the last part says:

…failed to provide any evidence that the advertised Ease-A-Cold product can shorten colds or reduce the duration of colds.

If only we could get the cosmetics companies as well…as this article in New Scientist says:

…cosmetics companies release very little in the way of trial data. In medical practice, the gold standard for proof of persistent benefit to a patient is the double-blind randomised controlled trial (RCT), published in a peer-reviewed journal. So why do we not see such a well-established methodology used for cosmetics?

The industry faces a dilemma. If a rigorous trial of an anti-ageing cream showed no benefit, no one would buy it. Yet if it really does produce structural changes and permanently reduces wrinkles, the cream could be reclassified as a pharmaceutical agent – which would mean it could no longer be sold to the public unless prescribed by a doctor.

As this article rightly points out:

A businessman’s perspective might be `I can market this product today without a trial, probably make more liberal claims about it, and get millions of dollars-worth of exposure instead’.

I wonder if a retraction like the one above actually has any affect on profits. Or indeed, sales?

In a related observation, I was shopping for a new pillow in the local boxing-day sales, when I came across a beauty: a natural-latex pillow infused with activated charcoal to provide a hygenic sleep environment that can also absorb any harmful electromagnetic radiation that might be passing by.

I paraphrase (I really should have taken a brochure, but I simply couldn’t: I didn’t want to be responsible for contributing even such dross as it to the landfill), but you get the idea.

Such fakery! The Demon-Haunted World remains a scary, confusing place for too many, it seems.

Tags: Rant


Following on from the previous post about JavaMelody

JRobin supports all standard operations on Round Robin Database (RRD) files: CREATE, UPDATE, FETCH, LAST, DUMP, XPORT and GRAPH


It mimics RRDtool, which apparently:

is the OpenSource industry standard, high performance data logging and graphing system for time series data. Use it to write your custom monitoring shell scripts or create whole applications using its Perl, Python, Ruby, TCL or PHP bindings.


JRobin is a Java re-write (not an RRDTool binding). It can do things like:


See the JRobin Gallery for more.

Wish I’d found out about this a few years back.

Tags: Tools


Looks like an interesting tool to keep in mind:

The goal of JavaMelody is to monitor Java or Java EE applications servers in QA and production environments. … it is a tool to measure and calculate statistics on real operation of an application depending on the usage of the application by users.

There’s a Grails plugin as well, which is icing on the cake.

Tags: Grails, Programming, Tools