Sign up

Solutions

Industries

Resources

Pricing

Sign up

Elicit Blog

Looking for Hidden Gems in Scientific Literature: The Promise and Practice of Literature-Based Discovery

Culture

Research

Oct 6, 2025

1 min read

But it really makes no difference whether the unknown lies in the lap of Nature or, instead, is buried among the pages of worthless manuscripts read by no one; because an idea that has not entered the bloodstream of science, and does not circulate seminally in it, in practice does not exist for us.
Stanisław Lem, His Master’s Voice

In the expanding universe of human knowledge, there is intellectual dark matter. Some of it is lost in time without trace, some is only accessible within private institutions (proprietary knowledge) and some can only be learned through direct experience but cannot be legibly recorded (tacit knowledge). Lost knowledge is an unknown unknown. Proprietary and tacit knowledge are a known unknown. But there is also the unknown known: not lost, but forgotten and recoverable knowledge contained in obscure or neglected published research papers. Don Swanson, the founder of the field of literature-based discovery, referred to it as ‘undiscovered public knowledge.’

Why do research papers remain in obscurity, does it matter if they do, and what can be done to redeem them? They may be hard to find to begin with, if they’re published in poorly indexed or low visibility journals or in a foreign language. Alternatively, they may be relatively easy to find but still be neglected if they are ahead of their time and not recognized as important at the time of publication (such papers are called sleeping beauties). It could also be that obscure papers are truly of low quality and are thus justifiably neglected. (reference 1)

But undiscovered public knowledge can also be scattered across even well-read papers whose findings complement each other but are not explicitly linked. A classic example was pioneered by Don Swanson himself in the 1980s, in biology — a field new to him as a physicist by training. In the course of literature review, Swanson noticed that magnesium deprivation causes symptoms similar to those of migraine but that papers investigating either migraine or physiological effects of magnesium had never been jointly cited or connected before. He formulated a hypothesis that dietary magnesium supplementation can alleviate migraine, which was subsequently supported by clinical trials. Swanson also noted that, given the sheer volume of research literature, such connections are bound to exist but remain truly unknown until someone discovers them: (reference 2)

Undocumented connections arise neither by chance nor by design but as a result of the inherent connectedness within the physical or biological world; they are of particular interest because of their potential for being discovered by bringing together the relevant noninteractive literatures, like assembling pieces of a puzzle to reveal an unnoticed, intended, but not unintelligible pattern. The fragmentation of science into specialties makes it likely that there exist innumerable pairs of logically related, mutually isolated literatures.

Scientific literature is vast. There are on the order of 100 million papers published to date. It is physically impossible for a single researcher or a team of researchers to be aware of, let alone study thoroughly, all existing literature even within their own discipline, let alone across disciplines. In light of this, and given the ‘inherent connectedness’ of natural phenomena that Swanson emphasized, there are bound to exist islands of knowledge in the published corpus that harbor latent connections between each other.

This is the promise of literature-based discovery (LBD) — to search for and reveal already existing, but still hidden, links between concepts, findings, questions and answers within scientific literature that would otherwise take much longer to stumble into. It is an attempt to build a whole elephant from the parts of it that are only known to researchers in different, disconnected fields and subfields.

Blind men and the elephant. From Martha Adelaide Holton & Charles Madison Curry, Holton-Curry readers, Rand McNally & Co. (Chicago), p. 108.

LBD can be viewed as a part of the ecosystem of ‘assisted’ scientific discovery — computational tools and approaches that aid human researchers at various stages of the discovery process: literature review, hypothesis generation, experiment performance, and data analysis and interpretation. LBD comprises the first two stages: mining scientific literature for novel hypotheses. Other tools and models can be trained directly on raw data, scaling up and speeding up the manual work of unaided human researchers. For example, deep learning models trained on DNA sequences, protein sequences and 3D structures, neural spike trains, imaging data or mass spectroscopy data learn to predict novel structures and biological function. On the hardware side, robotic labs such as Emerald Cloud Labs and University of Liverpool’s ‘mobile robotic chemist’ are being developed to automate wet lab work and run programmable experiments.

The first instances of literature-based discovery were accomplished by cumbersome manual screening of papers. But the field has moved much further since. With the primary medium of LBD being natural language, expanding the scope of the field required handling large amounts of text in an automated way. In response to this challenge, an array of methods has been developed in the fields of natural language processing, machine learning, and artificial intelligence.

These methods range from simple statistical operations of lexical statistics to automated reasoning over scientific literature using large language models (LLMs) of today (see Appendix). Along the way, standardized databases, ontologies and knowledge graphs were built to organize the expanding domain-specific vocabularies (including gene interaction networks, biochemical pathways, and chemical reactions). Scaling to increasingly large bodies of knowledge, LBD methods saw a steady increase in algorithmic complexity over time.

But do we observe a corresponding increase in the number of literature-based discoveries made after Swanson’s seminal studies? LBD’s most successful recent application has been in the field of drug repurposing. At the beginning of the COVID-19 pandemic, in late January 2020, BenevolentAI conducted an exhaustive search for potential drugs with combined anti-viral and anti-inflammatory potency for treating acute cases of the disease. Within two days, using their in-house literature-based knowledge graphs and a suite of ML methods, researchers at BenevolentAI screened in silico through 378 drug candidates, narrowing the list down to six. Of these, baricitinib was selected based on its safety profile and high affinity to its target, the AP2-associated protein kinase 1 (AAK1; an endocytosis regulator expressed in cells that are susceptible to the virus). This drug was previously approved for treating rheumatoid arthritis. A promptly conducted clinical trial showed a large positive effect (faster recovery and lower mortality) on the patients, and baricitinib received FDA emergency authorization as soon as November 2020.

Outside of biomedicine, LBD has been applied in some basic science domains. In materials science, it can make retrospective discoveries of functional materials like thermoelectrics when trained on the literature corpus (millions of abstracts) up to a certain cut-off year. That is, LBD can recover findings made after the training cut-off year, and we can verify the top-ranked candidates by checking the papers published in the following years. This is a testament to the core assumption of LBD: that scientific literature alone contains recoverable latent knowledge. But literature-based prospective discoveries in materials science, from the most up-to-date state of the literature into the future, have been lacking.

All in all, as a stand-alone approach, LBD still remains mostly a theoretical undertaking, despite the rich arsenal of computational methods it has at its disposal. Real-world discoveries that make it to clinical trials and approved drugs or basic discoveries in any discipline remain few and far between. Why is that?

First off, LBD has an evaluation problem. In computational, prediction-oriented fields like LBD, the output of a method is often a ranked list of candidates (hypotheses in this case). In order to be able to compare different methods within the field to each other and measure the field’s progress as a whole, we need shared scalable and reproducible benchmarks and evaluation methods. Considering that the real-world, experimental validation of LBD hypotheses, especially in biomedical sciences, is slow and costly, scalable in silico evaluation becomes even more important. Evaluation methods in LBD also formally define the target task: if a research community can agree on how to measure success, it can precisely define what LBD is trying to do.

This poses the question of what exactly counts as a literature-based discovery and what its boundaries are. What does establishing a hidden link between scientific concepts mean? According to one definition, ‘a discovery represents the linking of two or more concepts that had never previously been linked in order to produce novel, interesting, plausible, and intelligible knowledge’ (emphasis added). Such a definition leaves room for interpretation, making it challenging to apply analytical methods to it. Additionally, the definition of discovery may vary by domain (e.g. in social sciences vs biology).

For more than three decades since its founding, LBD evaluation has been mostly relying on the same small set of discoveries as a benchmark: Swanson’s magnesium – migraine and Raynaud disease – fish oil links, along with a few others. This is the so-called replication evaluation: to test whether a new LBD method can re-discover known cases that were previously found manually (in other words, whether it can rank those cases highly).

Are these discoveries an appropriate gold standard for LBD? Their number is small to begin with, and even when a replication works, it doesn’t differentiate whether one method is quantitatively better than another. The test set covers only a tiny portion of biomedical sciences, and there is no representation of other disciplines. More fundamentally, evaluation in knowledge-based discovery is intrinsically hard, since LBD methods surface putatively new knowledge that has not yet been empirically tested. Validation against a known set of hand-picked discoveries is insufficient and provides a weak theoretical foundation for the field.

These limitations are partially addressed in an alternative evaluation method, time-sliced evaluation. In this case, pairwise associations between concepts are not limited by the famous manual set but are searched for in the entire scientific corpus after a certain cut-off time. Every pair that did not co-occur before the cut-off time but starts to co-occur after it is treated as a positive. Pairs that don’t co-occur provide negatives. Highly ranked positive pairs are candidates for real discoveries.

But this method has been beset by its own shortcomings. For one, mere co-mentions do not in and of themselves constitute a discovery – there’s no way around human curation. Most co-occurrences are trivial or noisy, like pairings with generic terms (‘diabetes’ and ‘blood sugar’ or many diseases and ‘inflammation’). A high score assigned by an LBD method thus doesn’t necessarily point to a novel, valuable and generative connection.

The inherent difficulty of establishing evaluation metrics in LBD is what accounts for the fact that historically it has been easier to make a technological contribution to LBD, by developing a new algorithmic method, than to put together a high-quality annotated dataset for evaluation across the field.

Separately from the evaluation problem, it may be argued that language is ‘a lossy abstraction over reality,’ compared to raw data, so LBD may be inherently unreliable. Swaths of tacit knowledge in scientific practice as well as private explicit knowledge (the ‘known unknown’) remain beyond the provenance of recorded literature and can only be acquired through in-person training. And the scientific literature is plagued by non-replicable findings and journal paywalls, further complicating its usefulness as a jumping-off point for generating new knowledge.

Despite its original promise, as LBD researchers themselves admit, there appears to have been relatively little uptake of the LBD methods by their intended end users, active scientists, or few if any large-scale collaborations between LBD researchers and scientists in other disciplines. Even within LBD itself as a field, there are many small and medium-sized groups with little collaboration among them. Ironically, LBD seems to suffer from the same problem it was originally meant to solve: to bridge islands of knowledge and isolated research communities.

Still, the scientific literature is a repository (however noisy and incomplete) of cumulative knowledge of humanity. We would be leaving a lot of value on the table by not digging this knowledge up and looking for latent connections and ideas that are waiting for their due appraisal.

What do large language models add to literature-based discovery?

Pre-LLM methods of literature-based discovery were developed to detect pairwise links (like drug → disease) or linear chains of concepts in the scientific literature (like drug → gene → pathway → disease in knowledge graphs). But such simple relations limit the range and expressivity of hypotheses that could be generated. A major novelty that LLMs bring in is their ability to follow chains of thought in natural language, turning LBD into a reasoning task. LLMs can take in more inputs besides a starting concept, including problem context and constraints, and formulate hypotheses the way a human scientist would express them, in principle.

But there are obstacles on the way to reliable and valuable LLM-generated hypotheses. Hallucination is a major and inevitable concern because LLMs are trained to predict likely tokens based on prompts, and optimizing for plausibility does not guarantee truthful statements. One way to ground LLM outputs in existing literature is retrieval-augmented generation (RAG). Another approach is to implement self-critique where the model first drafts an answer to a prompt and then revises it, based on its own verification process (Chain-of-Verification, CRITIC, Self-RAG, Test-Time Diffusion Deep Researcher). But ultimately, LLM hypotheses have to be quality-controlled by human expert curation and empirical validation in the lab.

In some research fields, earlier LBD methods are still preferable to LLMs, due to the idiosyncrasies of their datasets. For example, most sociologists still rely on static semantic embeddings when they study cultural shifts using historical texts, for a few reasons. Computations involved in constructing such embeddings are easy to interpret — they are based on pointwise mutual information in word associations. LLM embeddings are largely opaque and can’t be reliably tracked to the statistical properties of the underlying training data.

Most importantly for the study design, pre-trained LLMs are prone to the ‘look-ahead’ bias where the model’s analysis of a historical text is affected by its pre-training exposure to the information from the future relative to that historical period. This leakage of future information renders LLM analysis nearly useless, and there are currently no reliable methods to make pre-trained models forget parts of their training data. While simpler semantic models can be relatively cheaply trained on separate historical corpi, compute costs make this infeasible for LLMs, at least in the context of academic research.

Could LLMs generate truly novel hypotheses?

In biomedical sciences to date, there have been few examples of genuine discoveries made by LLMs. Among them, Google’s AI Co-Scientist suggested an ingenious hypothesis (highest-ranked among five it generated) about a mechanism of a horizontal gene transfer among bacteriophages, after being exposed to publicly available research context on the problem. It turned out that this was the same hypothesis that had already been experimentally validated as a result of years-long, as yet unpublished work by the research group that tested the AI Co-Scientist. Arguably, this is an impressive feat and a significant step toward AI-assisted scientific reasoning.

Other, more modest, emerging studies also document an LLM-assisted discovery process where LLMs supply new hypotheses, followed by laboratory validation. Novel metabolomic findings in the yeast Saccaromyces cerevisiae and synergistic drug interactions in breast cancer cell lines count among such studies. But so far they have been limited to incremental findings in simple systems like cell cultures or microorganisms that allow for high-speed experimental testing of LLM-generated hypotheses.

But all in all, we haven’t witnessed a massive surge of breakthroughs that could be expected from models trained on all of human knowledge. Maybe there simply hasn’t been enough time to empirically validate all the potentially valuable LLM-generated hypotheses yet. Alternatively, could it be that LLMs lack some fundamental features of human thought that enable scientific creativity?

In her book The Creative Mind: Myths and Mechanisms, Margaret Boden considers three types of human creativity, which she defines as the ability to come up with ideas or artefacts that are ‘new, surprising and valuable.’ Combinatorial creativity generates unfamiliar pairings of known ideas. Such combinations can be seen in poetic imagery, humor, metaphor — and classic literature-based discovery. Exploratory creativity involves searching new regions within an existing conceptual space like painting, cuisine or a research area. In such a structured style of thought, finding a previously unnoticed route enriches that space and adds more detail to its map. Tracing new multi-hop chains of concepts along a knowledge graph is an example of exploratory thinking in LBD (from drug to disease or from a regulatory gene to its target).

In contrast, transformative creativity alters the concept-space itself and expands it beyond its pre-existing boundaries. Conceptual leaps are thoughts that ‘are now possible which previously (within the untransformed space) were literally inconceivable.’ These leaps require new representational primitives (‘bootstrapping’), or new representations of observed or imagined entities, that are absent from the language and need to be invented first. The bar here is very high and is reserved for peaks of human achievement: invention of calculus, Fourier analysis, linear perspective in painting, the periodic table, quantum mechanics, chemiosmotic theory (understanding cellular energetics), information theory, cracking the genetic code.

LLMs can excel in combinatorial and exploratory creativity which are possible within the existing vocabulary that they are trained on. The classic memorization vs. generalization/grokking trade-off (or phase transition) in LLMs maps onto these two types of creativity. Equipped with a rich repository of memorized knowledge, a model can produce potentially interesting concept combinations. Generalization, in contrast, implies that the model has developed a map of the conceptual space and can explore it beyond the known paths, though still remaining within it.

LLMs’ combinatorial and exploratory creativity shines in their ability to solve complex (IMO-level) math problems and assist human researchers in mathematical research. Terence Tao notes that there’s ‘a role for these tools in drawing out a user’s latent knowledge in a problem’, by ‘proposing reasonably relevant ideas that the user is expert enough to evaluate.’ In line with this, Scott Aaronson recently published a significant result in quantum complexity theory where a key technical step came from GPT5-Thinking — though arriving at the correct solution took several iterations, not unlike a conversation with a graduate student or a colleague. Undoubtedly, cases like this can considerably speed up the discovery process. As Aaronson puts it:

Right now, it [an LLM] almost certainly can’t write the whole research paper (at least if you want it to be correct and good), but it can help you get unstuck if you otherwise know what you’re doing, which you might call a sweet spot.

We are yet to witness truly transformational creativity in LLMs, however — scientific, mathematical or otherwise. But all three types of creativity include not only generating new links or thought-paths or representations but also evaluating them, recognizing their importance. These two functions, of a generator and a critic, may need to be separated if they are to be implemented in an LLM-based system.

Is it possible to approach LLM creativity in a more systematic way?

As reasoning entities, LLMs ‘are frozen, unable to learn from experience, and … have no “default mode” for background processing, a source of spontaneous human insight.’ Gwern’s solution to remedy these shortcomings is very similar to the general logic of classic literature-based discovery:

a day-dreaming loop (DDL): a background process that continuously samples pairs of concepts from memory. A generator model explores non-obvious links between them, and a critic model filters the results for genuinely valuable ideas. These discoveries are fed back into the system’s memory, creating a compounding feedback loop where new ideas themselves become seeds for future combinations.

With both a generator and a critic module in place, such a ‘daydreaming’ model would in principle be able to fully automate LBD. This model could be asked to regularly retrieve random sets of concepts from the continually updated scientific literature, then execute a ‘brainstorm’ prompt (evaluate which pairs are worth pursuing further), followed by a judgment of the critic module.

But implementing a daydreaming algorithm would be significantly more expensive than training the LLM itself (what Gwern calls ‘a daydreaming tax’). This is because background sampling of pairs of concepts is inherently wasteful – most pairs won’t be interesting. And there are no shortcuts around this, since if we already knew how to predict the value and interestingness of associations, there wouldn’t have been a need to traverse the entire space of possibilities to begin with. It may be that ‘the most far-flung and low-prior connections are the important ones,’ so none can be discarded a priori.

Donald Swanson’s classic magnesium–migraine link discovery required an unaided scholar to sift through research papers for weeks and months. A daydreaming loop in an LLM can potentially perform billions of such pairings in a much shorter time (hours?). The DDL algorithm could therefore realize the original promise of literature-based discovery – surfacing high‑value ‘unknown knowns,’ at scale.

Appendix

Evolution of computational methods used in literature-based discovery before LLMs

The first implementations of literature-based discovery, like Swanson’s magnesium–migraine association, are known as the ABC model. This model is based on the inference that if concept A is associated with concept B in one set of research papers, and B is associated with concept C in another, disjoint set of papers, then A is associated with C through the B-concept as an intermediate link. In Swanson’s example, migraine is concept A, magnesium is concept C, and the mechanisms connecting these terms – such as prostaglandins, vascular health, calcium channel blockers – are the B-concepts.

In the evolution of LBD methods, we can observe two broad trends: computational models used in LBD have grown dramatically in size (number of parameters) and algorithmic complexity, while the need for expensive human oversight has been slowly diminishing.

Yet this evolution has been far from linear. Each new wave brought its own requirements for human expertise, and even today’s most sophisticated systems, large language model (LLM)-based research agents, demand substantial supervision. In the limit, LBD tools should be able to take in a question from a human researcher and do all search, synthesis, and plausible hypothesis generation on their own. We are still in the early days of such implements.

The earliest LBD approaches from the 1990s were remarkably simple – they merely counted how often words appeared together in papers (lexical statistics). In doing so, they treated research articles as ‘bags of words,’ without regard for meaning or context. This approach isn’t able to distinguish between sentences like ‘a dog bites a man’ and ‘a man bites a dog,’ since order and syntax are not taken into account in mere word counting. Neither does it distinguish between a statement and its negation – ‘X affects Y’ and ‘X doesn’t affect Y’ look the same, as X and Y co-occur in both cases. And since words may co-occur by chance rather than meaningful association, metrics used in lexical statistics can catch many spurious connections. Human judgement here is necessary to separate the wheat from the chaff.

In the context of drug repurposing, lexical statistics hasn’t been found powerful and selective enough to be used as a stand-alone method, though it has been incorporated into a suite of methods developed later. But a few studies from the 1990s have relied exclusively on lexical statistics to rediscover Swanson’s Raynaud disease – fish oil and magnesium – migraine links, demonstrating that even such a noisy and simple method can detect high-signal connections in the literature, though not without human curation.

A transformative breakthrough in LBD came with distributional semantics in the 2000s-2010s, particularly through word2vec. The underlying insight here is the distributional hypothesis, best summarized by the English linguist J.R. Firth in his 1957 paper as ‘a word is characterized by the company it keeps.’ Instead of raw word counts, these methods operate at the level of word meanings (semantics), by placing every term in a high-dimensional vector space based on its neighboring words. For example, a word like insulin is meaningfully associated with other terms like glucose, glycaemia, diabetes, pancreas, pancreatic β-cells and so on. In word2vec, these semantically related words translate into similar vectors. This in turn makes it possible to generalize beyond exact word matches and handle synonyms and analogies in papers computationally. Word2vec can also disambiguate syntactic relations like ‘a dog bites a man’ and ‘a man bites a dog,’ thanks to its contextual sensitivity.

Word2vec thus represents a major inflection point in LBD – a transition from word counting to semantic reasoning. From then on, semantics became an indispensable part of the LBD process. With the training corpus vocabularies counting millions of words and vector space dimensions of 100-300, the model size (measured as their product) jumped to hundreds of millions of parameters, a massive increase compared to the lexical statistics methods.

Semantic methods

Around the same time when distributional semantics was entering the LBD toolkit, new ways of organizing and systematizing scientific knowledge enabled novel approaches in the field. In the late 1990s – early 2000s, researchers in biomedicine began organizing knowledge into structured databases and graph networks, to help navigate the expanding biological vocabulary. A few characteristic problems had previously made this task challenging: a single gene or protein often went by multiple names (for example, the epidermal growth factor receptor in humans is known as EGFR, ErbB‑1 or HER1), and naming conventions varied across species. Without a shared reference system, comparing results between species was difficult.

Systems like the Gene Ontology (GO) solved this by assigning stable identifiers – ontology terms – to biological processes (e.g. digestion, glycolysis, photosynthesis), molecular functions (e.g. DNA replication, splicing, ubiquitination) and cellular components (e.g. ribosome, nucleus, cell membrane) to which the contents of biomedical research papers could be mapped. For example, a glycolytic enzyme like hexokinase, in any species, maps onto GO:0006006, glucose metabolic process. Building this shared reference system required a lot of manual labor upfront for annotating papers and coming up with nested classification categories making up an ontology. With these systems in place, literature-based discovery systems could now index papers by concrete ontology terms and compare findings across studies and species more easily.

Also in the 2000s–2010s, emerging high-throughput technologies like sequencing, microarrays and proteomics generated massive datasets that needed to be integrated and interpreted across labs. Databases like KEGG (Kyoto Encyclopedia of Genes and Genomes), UniProt (protein sequence and function database) and Reactome (biochemical pathway library) began mapping genes and proteins onto curated pathways. Beyond a mere cataloguing of biological entities, these databases were designed to highlight interactions between them – which molecule activates or inhibits which other ones or which gene regulates which targets.

Graph-based methods

In light of this, graph-based methods of LBD became indispensable. Graphs represent knowledge as a network of connected concepts (nodes), with edges corresponding to relations between them. The main innovation here is the ability to trace longer chains of reasoning rather than just looking at pairs of related terms. LBD is abstracted as a link prediction (or missing edge) problem within the graph: given only the edges observed so far, which pairs of nodes are most likely to be connected? If the nodes already share a large number of common neighbors, it is more likely that there are still remaining hidden links that connect these nodes directly, pointing to a previously missed discovery. Parameter counts in knowledge graphs are in the tens to hundreds of millions, and they still require a lot of human curation.

Another innovation of knowledge graphs is that in addition to technical terms (of different types), they can include metadata like papers, authors, their affiliations and citations. Such graphs are called heterogeneous, or hetnets. They can detect cross-field patterns and directions of knowledge flow that are impossible to infer from the methods relying on the text of research papers alone. An example of a hetnet is Hetionet, which is assembled from 29 different databases of genes, bioactive compounds, diseases, drug side effects and more, consisting of 47,031 nodes (11 types) and 2,250,197 relationships (24 types).

Machine learning methods

In the 2010s, machine learning entered the picture. Broadly, ML models are those that learn from data without explicit programming and can identify patterns and make predictions about unseen data. Applied to Swanson’s ABC model, classifier ML algorithms like support vector machines and random forests can be trained to rank intermediate B terms between A and C terms, and can filter candidate B lists much faster than human researchers. But since they are trained on examples of already known links, these algorithms come up with more conservative and incremental rather than truly novel hypotheses.

Next wave of LBD methods, from the late 2010s, consisted of deep embedding methods like DeepWalk and node2vec. These ML models convert an entire biomedical knowledge graph into a geometric space where every concept is a point, and the distance between them indicates how strongly two ideas are implicitly related. Such mapping lets the algorithm filter through millions of papers at once and find pairs of concepts that don’t co‑occur in the existing literature but that nonetheless share rich structural context within the graph (for example, a metabolic pathway and a drug that hasn’t been previously tested for its effects on this pathway).

By acting on the graph rather than raw text, as word2vec does, deep embeddings can trace multi‑step patterns, starting from a gene all the way to a disease, and find cross‑disciplinary connections. Their use cases range from disease etiology to drug-disease associations to protein-protein interactions. Still, a deep embedding map is a frozen snapshot of a graph, and it has to be rebuilt each time to integrate new literature. And of course, the quality of the insight derived from it still depends on how carefully the underlying graph was curated.

Graph neural networks are an improvement over the static maps of deep embedding, as they are able to dynamically update node representations in knowledge graphs and capture contextual nuances better. This contextual awareness allows GNNs to trace more sophisticated reasoning paths and, crucially, to scale to graphs with millions of nodes. GNNs have been shown to improve LBD performance 2-4 fold compared to co-occurrence methods. Still, they have similar limitations to deep embedding methods, as they can be biased by highly connected nodes, and adding new nodes after the initial training requires retraining of the model.

References

Thalmann O, Perri AR (2018). "Paleogenomic Inferences of Dog Domestication". In Lindqvist C, Rajora O (eds.). Paleogenomics. Population Genomics. Springer, Cham. pp. 273–306. doi:10.1007/13836_2018_27. ISBN 978-3-030-04752-8.
Linnæus C (1758). Systema naturæ per regna tria naturæ, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Tomus I (in Latin) (10 ed.). Holmiæ (Stockholm): Laurentius Salvius. pp. 38–40. Archived from the original on 8 November 2012. Retrieved 11 February 2017.

All posts

Save time, think better.

Try Elicit for free

Sign up

Talk to Sales

Sign up

Sign up

Sign up

Sign up

Looking for Hidden Gems in Scientific Literature: The Promise and Practice of Literature-Based Discovery

Oct 6, 2025

1 min read

What do large language models add to literature-based discovery?

Could LLMs generate truly novel hypotheses?

Is it possible to approach LLM creativity in a more systematic way?

Appendix

Evolution of computational methods used in literature-based discovery before LLMs

Semantic methods

Graph-based methods

Machine learning methods

References

Related posts

All posts

Save time, think better.

Try Elicit for free

Sign up

Sign up

Sign up

Talk to Sales

Talk to Sales

Talk to Sales

Solutions

Industries

MORE

Solutions

Industries

MORE

Solutions

Industries

MORE