A startup called Emerald Logic claims it uses an evolutionary process to discover the best algorithm for predicting outcomes from any dataset. It might sound to good to be true, but the company claims successes already and is one of several startups trying something similar.
Think about being a hospital that wants to improve survival rates for patients. You have lots of data about patients — their medical histories, EKG readings, room numbers, doctors, billing information and much more — and you certainly know whether they leave alive or dead. Somewhere in all that data, the current thinking goes, there must be a formula that can predict what’s going to happen.
It’s not so much a big data problem as much as it’s a complex data problem. According to Patrick Lilley, co-founder and CEO of an Aliso Viejo, Calif., startup called Emerald Logic, the real world runs on systems where there are inputs and outcomes, only the complexity of the data we’re generating makes it very difficult to find the inputs that will lead to the best outcomes. He equates it to sticking a marble in a black box, eventually getting it out the other side, and then having to diagram what you think the inside looks like.
“The challenge there is you have to model what’s going in that system and you can’t often look inside,” he said.
Lilley also claims his company can help you find the answer. The company’s software, called FACET (short for Fast Collective Evolution Technology), tests tens of thousands of algorithms against a dataset in order to find ones that represent the relationships between those data and the end result. He calls the process “evolutionary computing,” because they evolve, mate and migrate, and only the best one survives.
“This is a monkeys-on-typewriters sort of problem,” he acknowledged, referencing those theories about how long it would take a group of primates to reproduce the complete works of Shakespeare. The software doesn’t know anything about the field it’s working on or have any presuppositions about what’s in it. It’s simply trying to predict one thing from another, and he says it’s pretty effective.
FACET works by taking a sample of a dataset, generating tens of thousands of algorithms from it, and then testing them in order to determine the most-predictive one. “Because it’s evolution, it tends to wash away the variables and the math operators that are unimportant,” Lilley explained. “… No more than eight things have ever mattered in any model we’ve ever generated.”
Once the process is complete, the algorithm is tested against new data in order to ensure its predictions are still accurate. Emerald Logic delivers FACET as a cloud service, so customers really only pay for the algorithm it produces. Customers own all the intellectual property associated with it, and Emerald Logic charges based on the economic value of the problem it’s trying to solve.
A hot field for startups, actually
All of this probably sounds a little too good to be true — and maybe it is — but Emerald Logic is really just putting a different spin on something that multiple startups are also pushing. There is BeyondCore, with its service for finding the variables most statistically relevant to a given outcome, and Emcien. There’s Ayasdi, which runs thousands of machine learning algorithms to discover and then visualize connections among massive datasets.
Emerald Logic’s promise actually sounds similar to that of Nutonian, a startup from former Cornell Creative Machine Labs researcher Michael Schmidt that claims its Eureqa software can “calculate laws of physics” present in business data.
Each approach runs into the question of whether anyone can trust some software to uncover what’s important in their data, but that’s not exactly the case. Once data scientists or business analysts see what the software has come up with, they can dive in and look at the variables, examine the connections, and figure out if they buy into it. They can run tests to determine if maybe there’s something there worth investigating further.
Besides, Lilley argued, he has proof that FACET works. The company did work with King’s College in London around identifying markers for Alzheimer’s disease and highlighted 14 out of a list of 11,000 possibilities. Half of them had already been mentioned on prior literature, a quarter had been thought of as possible markers and the remaining quarter were novel to FACET. It would have been easy to ignore what what the software found had it not validated those previous findings and inklings, Lilley said.
According to a February 2013 press release announcing that partnership, “Using these markers, plus APOE genetic information and demographics, the collaborators produced a mathematical classifier of 94% accuracy in distinguishing Alzheimer’s study subjects from controls or with those with mild cognitive impairment.”
In finance, FACET routinely finds that how a company incorporates is a strong predictor of whether it will succeed. In consumer loans, it has found that “effectively, liars tell longer stories,” Lilley said.
And in fact, he noted, Emerald Logic is his third startup with the same co-founder and FACET is kind of just an iteration on the technologies of the previous two. The first, called Digital Transit, used a genetic algorithm to do over-the-air software updates for mobile phones. That company merged with Bitfone in 2001, which HP acquired in 2006. In January 2014, Qualcomm bought the associated patents from HP.
The second startup, called Deep Six Technologies, generated decision trees based on data about email servers in order to do spam detection. The two founders have been working on Emerald Logic since 2011.
Whether or not FACET — or anything of its ilk — turns out to be a magic bullet, they’re all working under the same assumption that has driven the push of big data technologies and data science into the mainstream. Namely, that if data really does contain answers to tricky problems, there’s no way a person can figure out all the right questions to ask to find those answers among thousands of different variables. At some point, some parts of the process must be automated in order to steer people in the right direction.
This is why Lilley refers to Emerald Logic and FACET as “artificial imagination” rather than “artificial intelligence. “The more expertise someone has in a field,” he explained, “the more they know better and the less they sort of look around.
“… This method is pretty sideways. It’s not the way people are used to thinking about the problem.”
Feature image courtesy of Shutterstock user phipatbig.