Reliant’s paper-scouring AI takes on science’s data drudgery

AI models have proven capable of many things, but what tasks do we actually want them doing? Preferably drudgery — and there’s plenty of that in research and academia. Reliant hopes to specialize in the kind of time-consuming data extraction work that’s currently a specialty of tired grad students and interns.

“The best thing you can do with AI is improve the human experience: reduce menial labor and let people do the things that are important to them,” said CEO Karl Moritz. In the research world, where he and co-founders Marc Bellemare and Richard Schlegel have worked for years, literature review is one of the most common examples of this “menial labor.”

Every paper cites previous and related work, but finding these sources in the sea of science is not easy. And some, like systematic reviews, cite or use data from thousands.

For one study, Moritz recalled, “The authors had to look at 3,500 scientific publications, and a lot of them ended up not being relevant. It’s a ton of time spent extracting a tiny amount of useful information — this felt like something that really ought to be automated by AI.”

They knew that modern language models could do it: one experiment put ChatGPT on the task and found that it was able to extract data with an 11% error rate. Like many things LLMs can do, it’s impressive but nothing like what people actually need.

“That’s just not good enough,” said Moritz. “For these knowledge tasks, menial as they may be, it’s very important that you don’t make mistakes.”

Reliant’s core product, Tabular, is based on an LLM in part (LLaMa 3.1), but augmented with other proprietary techniques, is considerably more effective. On the multi-thousand-study extraction above, they said it did the same task with zero errors.

What that means is: you dump a thousand documents in, say you want this, that, and the other data out of them, and Reliant pores through them and finds that information — whether it’s perfectly labeled and structured or (far more likely) it isn’t. Then it pops all that data and any analyses you wanted done into a nice UI so you can dive down into individual cases.

“Our users need to be able to work with all the data all at once, and we’re building features to allow them to edit the data that’s there, or go from the data to the literature; we see our role as helping the users find where to spend their attention,” Moritz said.

This tailored and effective application of AI — not as splashy as a digital friend but almost certainly much more viable — could accelerate science across a number of highly technical domains. Investors have taken note, funding a $11.3 million seed round; Tola Capital and Inovia Capital led the round, with angel Mike Volpi participating.

Like any application of AI, Reliant’s tech is very compute-intensive, which is why the company has bought its own hardware rather than renting it a la carte from one of the big providers. Going in-house with hardware offers both risk and reward: you have to make these expensive machines pay for themselves, but you get the chance to crack open the problem space with dedicated compute.

“One thing that we’ve found is it’s very challenging to give a good answer if you have limited time to give that answer,” Moritz explained — for instance, if a scientist asks the system to perform a novel extraction or analysis task on a hundred papers. It can be done quickly, or well, but not both — unless they predict what users might ask and figure out the answer, or something like it, ahead of time.

“The thing is, a lot of people have the same questions, so we can find the answers before they ask, as a starting point,” said Bellemare, the startup’s chief science officer. “We can distill 100 pages of text into something else, that may not be exactly what you want, but it’s easier for us to work with.”

Think about it this way: if you were going to extract the meaning from a thousand novels, would you wait until someone asked for the characters’ names to go through and grab them? Or would you just do that work ahead of time (along with things like locations, dates, relationships, etc.) knowing the data would likely be wanted? Certainly the latter — if you had the compute to spare.

This pre-extraction also gives the models time to resolve the inevitable ambiguities and assumptions found in different scientific domains. When one metric “indicates” another, it may not mean the same thing in pharmaceuticals as it does in pathology or clinical trials. Not only that, but language models tend to give different outputs depending on how they’re asked certain questions. So Reliant’s job has been to turn ambiguity into certainty — “and this is something you can only do if you’re willing to invest in a particular science or domain,” Moritz noted.

As a company, Reliant’s first focus is on establishing that the tech can pay for itself before attempting anything more ambitious. “In order to make interesting progress, you have to have a big vision but you also need to start with something concrete,” said Moritz. “From a startup survival point of view, we focus on for-profit companies, because they give us money to pay for our GPUs. We’re not selling this at a loss to customers.”

One might expect the firm to feel the heat from companies like OpenAI and Anthropic, which are pouring money into handling more structured tasks like database management and coding, or from implementation partners like Cohere and Scale. But Bellemare was optimistic: “We’re building this on a groundswell — Any improvement in our tech stack is great for us. The LLM is one of maybe eight large machine learning models in there — the others are fully proprietary to us, made from scratch on data propriety to us.”

The transformation of the biotech and research industry into an AI-driven one is certainly only beginning, and may be fairly patchwork for years to come. But Reliant seems to have found a strong footing to start from.

“If you want the 95% solution, and you just apologize profusely to one of your customers once in a while, great,” said Moritz. “We’re for where precision and recall really matter, and where mistakes really matter. And frankly, that’s enough, we’re happy to leave the rest to others.”

#Reliants #paperscouring #takes #sciences #data #drudgery

Leave a Comment Cancel reply