Morgenstern, J. Using AI to improve scientific literature search results, First10EM, September 16, 2024. Available at:
https://doi.org/10.51684/FIRS.137526
Readers of First10EM will know that I spend way too much time on PubMed searching the medical literature. I use the website daily. It is probably the most used link on my computer. Despite that, I am more than willing to admit that PubMed – and specifically the search function – sort of sucks. It just isn’t good. To get even halfway decent results you sort of already need to be a medical librarian, memorizing all the specific MESH codes PubMed uses. PubMed just doesn’t use natural language well.
Google scholar is better. Google scholar uses natural language, so that the results are closer to what you were expecting. The search looks and functions closer to the tool we all use to search the web every day. However, the search function is still imperfect, with a large majority of the results being completely irrelevant.
I have used various alternatives over the years. At one point, I really enjoyed the Trip database. It is still a very valuable tool, which is more likely to present you with relevant articles, and the filtering options are also very good. However, since the introduction of a paid version, I have found my experience to be more hit and miss.
In 2024, there must be a better way to search the medical literature. Can’t I just use ChatGPT or one of these other incredibly powerful AI tools to do all the grunt work for me? While I definitely wouldn’t trust ChatGPT for scientific work, there are a number of AI tools specifically designed to improve literature searches. This is a quick summary of my initial foray into these tools. (For the most part, I am assuming that I, and most of my readers, will be using the free versions of these tools, and so that is where my review is focused.)
The YouTube Video
Consensus
This is a very helpful tool that not only provides a list of references, but will provide a summary of the research and determines whether there is a scientific consensus on a topic.
For example, when I ask the tool “is aspirin helpful in MI?”, it provides an answer of 100% yes, with a summary of “these studies suggest aspirin is helpful in reducing the risk of myocardial infarction (MI), stroke, and vascular deaths, both in acute and preventive contexts.” On the other hand, when I asked “does a patient need to have an empty stomach before surgery?”, it says that there is not enough information to provide an consensus answer, and the summary they give is “some studies suggest that an empty stomach is recommended before surgery to reduce the risk of regurgitation and aspiration, while other studies indicate that allowing certain fluids preoperatively may not increase these risks and could improve metabolic outcomes.”
Honestly, both of those answers are pretty good. They are definitely good enough to know where the consensus opinion is before starting a literature review. They might even be good enough to use for rapid clinical answers while working, although I have not used the tool enough yet to suggest that use.
Consensus also has a “co-pilot” feature which will summarize the literature found, and the results are surprisingly good. This is the result of my search about fasting for surgery, and yes all those numbers are links to real papers, with already formatted citations:
Cost
You get unlimited free searches, but are limited to 20 credits a month to be used on the more advanced features like the AI consensus meter, co-pilot summaries, or paper summaries. It will cost you $9 a month to get unlimited access to those features.
Use
In the long run, it would be amazing if these tools were accurate enough to provide rapid answers to EBM questions while working clinically. At this point, I don’t think we should trust them for clinical answers, but these will still be incredibly valuable tools. If a question comes up while working, you can get a quick sense of whether there is a consensus. If the AI thinks there is a strong consensus, it might be good enough to just download and read the linked systematic review. If there is no consensus, you are going to want to turn to the primary literature. This tool will get you started on that literature search, but is not adequate on its own. (We will discuss tools designed for an in depth literature search below.)
Perplexity
Perplexity is very similar to Consensus, but I have been somewhat less impressed so far. It seems to draw more heavily on websites, which are often patient information websites, rather than from medical literature or guidelines. For example, this is the summary provided for the question “does a patient need an empty stomach before surgery?”:
It is hard to see here, but all of the sources used are patient information websites like this one. I have subsequently done other searches that look more evidence based – for example, the answer to “are IV antibiotics stronger than oral” was reasonable – but it will be hard to fully trust the results after seeing this. (Not that I suggest trusting any AI results.)
Of note, perplexity has a mobile app, which would be a valuable tool if it ever got good enough to answer quick clinical questions on shift.
Cost
There is a free version of perplexity, basically lets you do everything that you need to do, but limits your number of searches per day.
Use
Perplexity could be used a lot like Consensus, but based on my initial experiences, I don’t think it should be used for scientific or EBM searches.
Semantic Scholar
https://www.semanticscholar.org
At first glance, this seems like a tool that would be valuable as part of a larger research project. I wish it had been around when I was writing my thesis. It lets you save papers that have been influential to your research, and then will use that data to help you find other relevant studies, and keep you up to date as new studies are published.
The basic search function is really not all that different from PubMed. The filters might be a little more intuitive, but the results were almost equally irrelevant. (Admittedly, that might be because I have not spent much time training the system on the type of papers that I find valuable.)
There is a fairly clear divide among the software options I have explored. Some, like Consensus, are incredibly good at providing answers to new questions (and unlike the mainstream AI systems, these answers seem to be based in reality, with links to underlying research). For my workflow, where I am starting with a new research question every few weeks, these tools are very valuable. However, these tools don’t usually store, organize, or expand upon your results.
Other tools, like Semantic Scholar, are much better for long term research projects. Not only do they store and organize your papers, but they seem like they will learn over time, and become more useful the more you use them.
One huge advantage that I found of Semantic Scholar over PubMed is that a very large number of the papers have a PDF accessible with a single click.
The basic search output looks very similar to PubMed or Google Scholar:
Cost
As far as I can tell, this service is completely free.
Use
This is essentially a direct PubMed replacement, with some upsides, but it might not be the best option available.
SciSpace
I have not paid for this tool as of yet, but my sense is that this might be the most powerful tool overall if you take advantage of all the customizable features. It will provide a summary for you, but it also provides a table that you can customize extensively. In that table, you get a list of research papers, just like any search of pubmed or google scholar, but what makes the outcomes valuable is that they will automatically give you quick summaries of the paper, with options such a 2-3 sentence key insights column, or just a TLDR (too long didn’t read) summary of the paper. For a researcher looking for a replacement for pubmed, I think this might be the best option, but I think for it to be really useful you probably need to pay for the premium version.
Here is an example of what I think is a reasonable summary of the evidence for the question “is aspirin helpful for myocardial infarction?”:
Here is an example of the table a literature search produces:
There are a large number of options for the columns, and they seem great, such as simply displaying the methodology used, or getting a 1 sentence AI generated summary of the paper.
Cost
I found the free version of this software to be much more limiting than others. Premium is $12 a month for individuals or $8 for groups.
Use
This is a high quality PubMed replacement. I started this journey looking for free tools, but the more I look at this beautiful tabular output, the more I think I might pay for this service.
Lens.org
Lens clearly has a commercial aim, integrating not just a scholarly search, but also an extensive search of patent databases. It provides a ton of information, including which companies hold the most patents in your area of interest, as well as which institutions or authors are publishing the most papers. (That last bit might be interesting for students deciding on a school, or researchers looking for collaborators.) The data displayed is pretty amazing if you want to get a broad overview of the type of work being done, where, and why it is being done. For the most part, these results are overkill for my purposes.
This is the type of broad data visualization you get when you search “thrombolytics for pulmonary embolism”:
Authors with the most publications for “thrombolytics for pulmonary embolism”:
In terms of the actual literature search, the results appear to be very extensive, and there are a broad variety of filtering options, but I found the interface to be very slow, and results were no more relevant than what I would find using natural language searches on PubMed. In fact, I found the interface to be less useful than PubMed (although, to be fair, I have years of experience with PubMed, and only played with this for a short while before I got too frustrated to continue.)
I will say, if you were using this from the outset, it does look to have some powerful features. You can save different articles into different collections. You can make notes about the articles within the database. You can add tags. However, other tools on this list have easier access to the papers themselves, including sometimes including the PDFs for you, which Lens.org does not. It is a powerful website, but I don’t think it is the best option for basic medical literature searches.
Cost
People doing non-commercial work are able to sign up for a free account. As far as I can tell, that gets you almost all the services, although there are some higher end add-ons for purchase.
Use
This is not a tool I would use for answering a rapid clinical question, and it probably isn’t the best option for an in depth literature review (although it definitely can serve that function). However, if you are looking for a broad overview of the state of a particular field of research, including things like where the money is and where the research is being done, this tool might be very helpful.
Scite
Unlike the other products reviewed here, Scite does not have a free option. It therefore doesn’t really fit my criteria, but I have heard of a few people using it, so I thought I would include a link. I played with a few of their tools as part of a demo / free trial, and they are strong. The search results look good, but not necessarily special as compared to the other options here. The research assistant is essentially an AI chatbot based on up to date scientific resources, and provides the kinds of outputs you would expect:
There is a browser plugin and a zotero plugin that could make a difference depending on your workflow, but I wasn’t going to buy this product, so I didn’t spend any time exploring them.
Research Rabbit
This tool is incredibly helpful at finding papers related to one you already know to be relevant. For example, you find a single RCT on a topic, and you wonder if others have been done. That is often a difficult question to answer with pubmed, but by adding the paper to research rabbit, it will rapidly display similar research. The interface is very powerful. It lets you add papers directly, but will also link with a zotero account if you use a citation manager. Unlike other sites that provide a similar service, Research Rabbit can search based on just a single paper, but also based on any collection of papers you provide. There are many customizations available. In addition to searching by topic, you can search by specific researchers. You can focus only on future papers, or previous papers. There are options for exporting the results in various formats.
These are the results I get when I started with the PEITHO and MOPPET 3 studies for thrombolysis in PE:
Cost
This is a free tool, and they say it will remain free forever.
Use
This too might be too much for the average clinician with a simple clinical question, but for anyone who has ever had to perform a literature review manually, this is an absolute game changer.
Connected papers
https://www.connectedpapers.com
This is another really useful tool to expand a search once you have found a paper you know is relevant. It will prepare a pretty looking graph based on citations, or you can just display the results in list form. There are a number of useful filters, allowing you to focus on papers published before or after the paper you searched with, as well as to focus only on papers available open access or that have a pdf available.
This is what my results look like when I start with the PEITHO trial of thrombolysis for intermediate risk PE:
Cost
The free version will only give you 5 searches per month. The $5 monthly academic subscription gets you unlimited searches. I like that they note that they have a scholarship program available for people who need but are truly unable to afford the pro version.
Use
Again, this type of tool is incredibly useful when performing an in depth literature review, but this is very similar to Research Rabbit, and considering that Research Rabbit is free and actually seems to have more features, I am not sure why anyone would pay for this.
Microsoft co-pilot
Obviously, Microsoft co-pilot is not a specific research AI, but the features can be helpful as part of a large literature search. (There probably isn’t anything special about co-pilot, it just happens to be the AI system built into a browser, and so it is easy to use when performing searches.) The idea would be to perform literature search exactly like you always do, but use the Microsoft Edge Browser instead. Open the co-pilot tab, and then when you find papers or abstracts, you can have the co-pilot rapidly summarize those papers for you, to limit the number you have to read in full. I wouldn’t trust these summaries for my final results, but it relieves some of the headaches of sorting through hundreds of pages to determine whether a document is worth downloading and reading.
Your result will look something like:
Summary
I truly think these tools will revolutionize the way we perform literature searches. Your (and perhaps my) time is too valuable to be bogged down in pubmed simply reviewing search results. We never go past the first page of google search, but to get anything useful out of PubMed you are often forced to go through dozens of pages of results. I have no doubt that these tools will be incredibly valuable, but there are so many options, I honestly haven’t settled into a clear workflow yet. (I would love to hear the experiences and opinions of others here.)
I hope it goes without saying, but these tools should not replace critical thinking. You will still need to use your brain, but your brain should be focused on analyzing, not finding papers. In my career, I have spent countless hours just scrolling through PubMed scanning abstracts trying to find relevant papers. If that time had been spent instead reading, analyzing, and discussing papers that an AI had found for me, I would have been far more productive.
Like any tool, your rating will almost certainly be determined by your goals. (A shovel is a great tool, but not if you are attempting to hammer a nail.) I will admit that I am early enough in my exploration of PubMed alternatives that I don’t know my specific goal. It might be to provide better results. It might be to provide the exact same results that I have been getting, but to do it faster. It might be to help me ask more interesting or impactful questions; questions that are related to but tangential to those that started my search. I am not sure exactly how I will use these tools going forward. The value of these tools to you will invariably depend on what you’re hoping to get out of them, but I am convinced that they will be valuable to anyone working with the medical literature.
I know there are many other AI based research tools out there. If I missed one that you use, think is better than the above, or provides different kinds of results, please leave it in the comments below.

6 thoughts on “Using AI to improve scientific literature search results”
This article provides an in-depth exploration of literature search in the medical field. The author offers a comprehensive analysis of current tools’ shortcomings and the potential of new AI tools. I particularly agree with the author’s view that these AI tools can significantly enhance our work efficiency. Given the backdrop of continuous technological advancement, I look forward to seeing more innovative literature retrieval methods!
GenAI can greatly enhance medical literature search, but only when it is powered by a high-quality, comprehensive, and curated data backend. On its own, AI without robust data is not reliable — especially in the medical domain. Without access to authoritative and structured medical databases (like Embase, Medline, or clinical guidelines) GenAI may: Miss important studies (false negatives), Surface irrelevant or outdated research (false positives) , Provide superficial summaries or hallucinations. AI models trained on unverified or open web sources can: Reflect existing biases in online content, Prioritize popularity over scientific validity, Misinterpret the context of clinical evidence Medical professionals require: Source citations, Reproducibility of searches, Regulatory-grade evidence. GenAI without structured data lacks auditability, which is critical in healthcare decision-making. Its important to know that which content, how much content and what technology used in machine learning Without that foundation, GenAI may create more confusion than clarity specifically in Medical field which is a very responsible industry and directly associated with human life.
Huge thanks for sharing this with all of us! Your post is packed with useful tips and thoughtful ideas that I’ll definitely apply in my daily life. It’s rare to come across such sincere and well-organized content, and I’m really grateful for your generosity.
The review is practical, focused on free or affordable options, and reflects the author’s ongoing exploration of an optimal workflow.