TIBCOmmunity navigation

Category: XTP

Nov 20 2011

Event Processing at the Large Hadron Collider

Earlier this month Dr Neil Geddes gave a fascinating presentation at the BCS on “Data Processing at the Large Hadron Collider”, describing how LHC experiments create 1 Billion events per sec of which they can record in detail 100 events per sec. Neil’s backgrounder on particle physics was certainly worthy of a an award for the best Dummys Guide, covering everything from molecules and atoms to leptons, mesons, baryons etc. The role of the LHC is to help discover new particles like the Higgs boson, as current astronomical observations don’t fit current physics theory (such as the rotational speeds of stars indicating some unknown gravitational force on galaxies).

The LHC uses a proton collider to create new particles: the rates of the experiment are approximately:

  • 40MHz of “proton bunch” collisions
  • which give 10E9 Hz proton collisions
  • which give 10E-5 Hz particle production

So one part of LHC experiments are to do with creating these particles, and the other part is “Complex Event Detection” - detecting and tracking the particles that fly out due to the proton collisions - and then doing data processing to try and reconstruct the “collision event” as it happened…

From an event detection perspective, as one would expect, the detector hardware is complex and layered to detect different types of particles:

  1. Silicon CCD strips (similar conceptually to a digital camera sensor) providing 66M channels (c.f. camera pixels) detecting at 45MHz; these measure the curvature of particle tracks in a magnetic field, from which particle momentum can be calculated
  2. Calorimeter: measures the energy of particles by their absorption (and subsequent heat generation)
  3. Lead Tungsten crystals which create measurable light when heavier particles are absorbed, using layers of brass and crystals (with the brass apparently sourced from redundent Soviet artillery shells !)
  4. Drift chambers that provide the same function as the layer-1 silicon strips but are much larger and hence with fewer readout channels

The instrumentation of these sensors also has to deal with the issue of detector latency, where for example a single calorimeter reading could include the impacts of many events. The design of the sensors tries to mitigate this as much as possible - for example the average occupancy of the silicon detectors is designed to be 2%.

In terms of event rates, the numbers are staggering: 1 Bn proton-proton interactions per sec means 1000 particle tracks created every 25 ns which leads to 100Pb event data per sec which needs to be mapped to 100Mb per sec. So the approach in event processing is to make successively more complex decisions on successively lower data rates. These conversions are:

  1. 40Mhz reduced to 100KHz via custom electronics
  2. 100KHz reduced to 1KHz initial processing
  3. 1KHz reduced to O(100Hz) or 3-10ms per event.

For one of the LHC particle colliders (CMS: the others are known as Atlas, Alice, and LHCb) the event processing is done via a PC grid (up to 150K CPUs) using the approach of (a)  identify high energy particles and (b) network routes each event data to a specific process. They use a huge disk buffer at the expirement source in case of network failure, but otherwise distribute processing in a cross-Europe WAN (such as the CERN-UK connection handling 10Gb per sec).

Interestingly they have found that:

  • the network has proved much better than expected so the nodes are increasingly being used for data caches
  • CPU use has more than doubled in the last ~18months

Also interesting is how much more hype there is for “big data” (MapReduce, Hadoop, et al) versus developments in “extreme event processing” like that done at LHC.

Note: the presenter used the term “data processing” whereas I deliberately used the term “event processing”. Actually both are involved: event detection and initial processing, then distribution / storage and ~batch-type data processing.

VN:F [1.4.2_694]
Rating: 3.5/5 (2 votes cast)
  • Share/Save/Bookmark
May 16 2010

TUCON2010: Keynote - Reliance Communications

Dr. Sumit Chowdhury, CIO of Reliance Communications, was another keynote presenter and CEP user at TUCON this year. Reliance is one of the biggest telco operators in the world operating mostly in Asia.

Sumit viewed the evolution of organisations as being matched by their core IT approaches: moving from simple to complex in organisation (and this mostly diven by size) being relected in their IT evolution from historic -to- responsive -to- predictive -to- analytic. To explain this, Sumit compared business organisations with organisms, with stimuli resulting in reflex operations or, for more difficult stimuli, to the brain. The IT equivalent is that typically transactions are treated by preprogrammed, reflexive, business systems… but where analytics lead to more intelligence leading to more evolved “reflex actions”…

Reliance handle 330k channel payments per day and deal with 1M calls per day, collecting events such as “phone recharge orders”. As these transactions occur too fast to rely on getting the full history (ie context) for customers in an on-line database transaction, they prime their CEP system with historic customer infomation, allowing them to enable policies like “for every 4th phone recharge by a customer, reward with a free 30mins calltime”. Effectively the state of the customer is managed in TIBCO BusinessEvents state machines, and they exploit declarative rules which are easier to change via a dashboard.

Sumit ended with some key requirements for successful business solutions:

  1. business control (e.g. ready adjustment of business rules, policies and decisions)
  2. scalability for business growth (e.g. ability to re-architect for multiples improvement in throughput)
  3. context-aware (e.g. access to any necessary metadata and history for decision making)
  4. real time (e.g. making decisions in the timescale required by the customer - “customer time”)

This was an excellent keynote - covering aspects of business philosophy as well as technology trends. Reliance have been using TIBCO BusinessEvents for eXtreme Transaction Processing type applications for a while and have been instrumental in some of the scalability enhancements in past releases.

VN:F [1.4.2_694]
Rating: 4.0/5 (1 vote cast)
  • Share/Save/Bookmark
Mar 10 2010

Up-Scale Your Apps with Distributed Caching

aspaces… was the subject of today’s Forrester-IBM webinar on distributed cache technology, with both Forrester and IBM citing CEP and EDA as users for this technology, amongst others. The overriding driver for this tech being eXtreme Transaction Processing, which we might just refactor as eXtreme Event Processing for the purposes of this blog!

One minor quibble: John Rymer of Forrester did the introduction and during so classified the cache market as .NET, Java and NoSQL camps, with TIBCO placed in the Java camp. This might seem a fair classification of a complex market area, but of the 2 relevant TIBCO distributed cache offerings:

  • TIBCO BusinessEvents, although Java-based, is more accurately described as a CEP product that embeds a distributed cache - it wouldn’t normally appear on a vendor list of distributed cache technologies;
  • TIBCO ActiveSpaces is more accurately described as a data grid, but has .NET, C and Java interfaces. I’m sure other caching / data grid products have similar multiple interfaces - after all the client is just “an interface” to the cache /  grid.

Spare a thought in passing, though, for the OODBMS guys. Amongst this buzz about data grids and caching, I notice the Forrester blog is reporting that the Progress guys (disclosure: a TIBCO competitor in some areas) are now considering their ObjectStore OODBMS a “legacy platform”. Plus ça change, perhaps.

VN:F [1.4.2_694]
Rating: 2.5/5 (2 votes cast)
  • Share/Save/Bookmark
Oct 21 2009

CEP-based Policy Management for High Performance Service Gateways

policybasedservicegatewayThanks to Dr Mark Darbyshire from TIBCO’s Vertical Solutions group for the link to an interesting paper on a “Policy Appliance Reference Model” by K A Taipale, otherwise referred to by the (obvious abbreviation) term “PARM”.

The paper describes PARM as: …an enterprise architecture for information sharing and knowledge management … based on policy appliances (technical control mechanisms to enforce rules) interacting with smart data … and intelligent agents… to reconcile, enforce, and monitor agreed information management policies for information security, data quality, and privacy protection across heterogeneous information sharing systems and networks.

As seems usual, the last sentance is a summary of the description and reads best: [PARM] supports policy-based information management processes through rules-based processing, selective disclosure, and accountability and oversight.

Although PARM is not directed at event processing, it should also be relevant (if not more so) to “data in motion”. Indeed it has many similarities to a CEP-based solution available on request from Dr Darbyshire’s team at  TIBCO that exploits TIBCO BusinessEvents to provide a service gateway with embedded policy management, currently deployed for high-performance telco problems but certainly suitable for handling SOA governance in SmartGrid, government, finance and transport type domains too.

VN:F [1.4.2_694]
Rating: 4.0/5 (2 votes cast)
  • Share/Save/Bookmark
Apr 21 2009

The Model2Agent Approach to configuring distributed systems

One of the interesting requirements for XTP/XEP (as mentioned in the previous blog posting) is that such performance (and the need for fault tolerance / failover) requires co-operation across multiple systems, ideally generalised in a multi-agent architecture [*1]. The problem here is usually in re-factoring to change performance characteristics as the project develops, to optimize the architecture to the performance needs, available compute nodes, and network (as well as changes to event throughputs).

One solution is to take the agent-based approach, where agents can be re-configured (through model parameters) very simply. Agents can:

  • execute declarative production rules “as required”, removing most of the need to specify / re-engineer (/ debug etc) “entry” and “exit” points;
  • use queries to partition data (/event objects) and process loads to separate agents;
  • rely on shared distributed cache to provide a “co-operative model” of events and data for processing.

The re-factoring / re-architecting process is, to some extent, as simple as specifying which rulesets are assigned to which agents, and defining more or fewer agents as required:
a. Revise named agents (aka agent types and roles)
b. Re-allocate rulesets (or queries) to appropriate agents
c.  Deploy (or re-deploy, with appropriate care) in numbers required.

So the agent modeling aspect is to map the event processing elements first to agent base types (e.g. inference or query [*2]) and subtype (e.g. event preprocessor, event router, event/data processor, batched event/data processor,  etc)  and verify the event flow / data flow is complete and without contention. In this way you can quickly re-architect a TIBCO BusinessEvents application from single-node to multi-node, high performance to extreme performance…

Notes:

[1] Agent representations and modelling is currently the subject of an OMG standard, AMP, currently under development.

[2] CEP agents in BusinessEvents come in 2 forms: inference agents and query agents, roughly equating to CEP agents and Event Stream Processing agents…

VN:F [1.4.2_694]
Rating: 3.0/5 (1 vote cast)
  • Share/Save/Bookmark