TIBCOmmunity navigation
Jul 28 2011

ACM Overview of BI Technology misleads on CEP

The latest edition of the Communications of the ACM had an article covering CEP from the BI perspective, written by researchers from Microsoft and HP. On Complex Event Processing they write:

The competitive pressure of today’s businesses has led to the increased need for near real-time BI. The goal of near real-time BI (also called operational BI or just-in-time BI) is to reduce the latency between when operational data is acquired and when analysis over that data is possible. … A class of systems that enables such real-time BI is Complex Event Processing (CEP) engines…

They then quickly make the mistake of assuming CEP = Event Stream Processing or ESP

Applications define declarative queries that can contain operations over streaming data such as filtering, windowing, aggregations, unions, and joins. The arrival of events in the input stream(s) triggers processing of the query. These are referred to as “standing” or “continuous” queries …

Of course, rule-based CEP, using constructs like Event Condition Action rules, and indeed other languages / algorithms can be used to define aggregate events to support real-time  BI tasks. Ooops.

There are several open technical problems in CEP; we touch upon a few of them here.

This should be interesting…

One important challenge is to handle continuous queries that reference data in the database (for example, the query references a table of customers stored in the database) without affecting near real-time requirements.

I’m pretty sure most stream processing engines provide some ability to do this (if you want to). But normally the last thing you want to do in (near) real-time business applications is waste time querying a RDBMS. Common practice is to put the data you need first into memory or a datagrid / cache, where access times are much lower.

The problem of optimizing query plans over streaming data has several open challenges. In principle, the benefit of an improved execution plan for the query is unlimited since the query executes “forever.” This opens up the possibility of more thorough optimization than is feasible in a traditional DBMS. Moreover, the ability to observe execution of operators in the execution plan over an extended period of time can be potentially valuable in identifying suboptimal plans.

Again, I’m pretty sure all the main CEP technology provides have optimised CEP engines - for example TIBCO uses a high-performance version of the Rete algorithm. I’m pretty sure Microsoft’s stream processing engine has optimizations too!

Finally, the increasing importance of real-time analytics implies that many traditional data mining techniques may need to be revisited in the context of streaming data. For example, algorithms that require multiple passes over the data are no longer feasible for streaming data.

This is actually an issue for the analytics guys: continuous analytics do indeed imply computing statistical models an event-at-a-time rather than “all at once against some vast data store”. The good news is that the analytics world has been doing some of this for some time, in order to accomodate complex processing of large data sets by batching and recombining data as it is processed.

The BI Overview also mentions several other “BI technology” aspects that are also often combined in CEP solutions: in-memory, distributed (map reduce a.k.a. “divide and conquer”), and analytics. So although it wasn’t emphasised in this paper, it seems the interesting development here is a certain amount of convergence in using these technologies together!

VN:F [1.4.2_694]
Rating: 3.0/5 (4 votes cast)
  • Share/Save/Bookmark
Jun 30 2011

Academic activity in CEP…

DEBS 2011 is an ACM conference too

… seems to be increasing if the rate of academic conferences related to this area is an indication. In Europe we have seen 3 recent announcements for the next 12 months:

If any academic-types are considering visiting Italy for the first event above, they could also listen to Google talking about using CP to solve allocation problems etc at the 17th International Conference on Principles and Practice of Constraint Programming, Perugia, Italy from 12-16th September 2011…

VN:F [1.4.2_694]
Rating: 3.0/5 (1 vote cast)
  • Share/Save/Bookmark
Feb 12 2010

Communications of the ACM on Data in Flight (or maybe DBAs in Fright)

Brenda Michelson over on ebizQ’s Business-Driven-Architect blog had an interesting post on an article published in Communications of the ACM titled “Data in Flight”. The article tries to explain the idea of “data in motion” and the benefits of event stream processing, presumably assuming an audience of database folks.

Brenda’s quotes from the article are probably a good place for a bit of commentary:

“The streaming query engine is a new technology … “
… if you mean “new” is 10 years old or more. But maybe the author means “new” to the presumed reader? Although I would have thought that readers of this particular journal would be well advised on such technology trends…

“CEP has been used within the industry as a blanket term to describe the entire field of streaming query systems.”
Actually CEP covers multiple types of event processing, including continuous queries on event streams. Is there something fundamentally different between an event stream (from some external source) and a “data stream” (for some internal source)? No, I don’t think so.

“This is regrettable because it has resulted in a religious war between SQL-based and non-SQL-based vendors…”
Religious war? Most people accept that there are many types of event processing languages, and these are best suited to different types of event operations - and we see this in a variety of leading CEP vendors. Indeed, 2 out of the top 3 market leading CEP vendors (including TIBCO as the market leader) do not rely on SQL-based continuous queries (although this is indeed available, for tasks like stream processing, in TIBCO BusinessEvents).

“… in overly focusing on financial services applications, has caused other application areas to be neglected.”
Overly focusing on financial services applications? Well, it is certainly true that many CEP vendors focus on algorithmic trading, probably contributing to some vendors’ eventual demise during the recent financial downturn. But TIBCO’s main CEP customer base is in transport, logistics, and telecom, closely followed by government and then financial services.

“Because of their shared SQL language, streaming query engines and relational databases can collaborate to solve problems in monitoring and realtime business intelligence. SQL makes them accessible to a large pool of people with SQL expertise.”
Well, continuous query languages tend to be loosely based on SQL, but have different semantics. For example, in TIBCO BusinessEvents Query Language (BQL), there is a policy statement that defines things like time window, resultset size, etc. Such a continuous query, executing asynchronously, is probably not something a synchronous SQL / stored procedure developer will be automatically comfortable with.

“…streaming query systems can support patterns such as enterprise messaging, complex event processing, continuous data integration, and new application areas that are still being discovered.”

Well, this is truly where the ACM article reviewers missed some hype. As a subset of CEP, for sure, continuous queries “support” operations such as routing events, identifying event patterns, merging event data, and so forth. But so do other event processing technologies (and TIBCO customers do these with both production rules and queries…).

So one might conclude that this article is some sort of clever marketing from a stream-processing start-up. Which, indeed, it probably was - and all kudos to them for their success in getting this published in ACM. I believe it was aimed at encouraging database-users out of their “transactional shells” into the “real-time world of events” - which is possibly a frightful proposition to the CIOs and DBAs used to the staid and stable world of “data at rest” (and presumably on a database somewhere).

And hopefully someone (maybe from EPTS?) will counter with a more balanced article on event processing applications and use cases in the near future.

VN:F [1.4.2_694]
Rating: 2.8/5 (5 votes cast)
  • Share/Save/Bookmark
Jul 13 2009

DEBS’09: woe (or whoa) on the terminology of events, and other observations

At DEBS last week, the EPTS Language and Use Case working groups presented tutorials, and the EPTS Reference Architecture group (co-chaired by TIBCO) met for a useful catch-up session. But sorely missed were the “Glossary-ers”, tasked with standardizing the terminology used in event processing applications and systems. The importance of this came to light when a discussion started up about whether “incidents” were events (which of course they are, although they may not be detectable at the time they occur). Opher Etzion (from IBM Research) covered some more of this in his discussion on the Use Case tutorial.

Interestingly, there is a whole ITIL section on the process of Incident Management, which seems yet another application area for Complex Event Processing (at least for the detection part: other aspects may also include decision management and process management).

Other observations from DEBS were that:

  • Compared to previous years, there was much more focus on event processing rather than the (possibly simpler, probably more established) aspects of pub-sub middleware.
  • ACM accreditation seems to have done the conference no harm, and indeed seems to have made the organisers’ lives easier.
  • There was little in the way of progress to standards, with no or little mention of PRR or RIF; at least the Siemens CEP team were progressing on an interesting project using BMM, and were interested in the proposed EMP. The latter is awaiting more interest, including that likely to result from a supposedly planned link-up between the EPTS and OMG
  • Was it just me, or did there seem to be a dominance of attendees (and presenters) from Germany? Is the land of “Vorsprung durch Technik” stealing a lead in IT by recognizing the advantages of event processing over conventional data processing?

Next year DEBS’10 moves back to Europe to the tranquil quads (and incorrect punting technique) of Cambridge University. Attendees can probably look forward to cries of  “Ich habe sich in den Fluss!” or somesuch…

VN:F [1.4.2_694]
Rating: 3.0/5 (2 votes cast)
  • Share/Save/Bookmark