Metascience tool for replicability: scite

I came across scite.ai recently. It’s a tool that looks at how references to a paper cite it (considered as either supporting, mentioning, or neutral) based on the text surrounding the citation, and then link these back to the original paper. I think this has a lot of potential and that it could be particularly helpful when overviewing a new field where you are not yet familiar with what is reliable. It could also be useful for highlighting replicability/non-replicability, as there doesn’t seem to be an exisiting way to do this besides looking for a comprehensive review article (or doing one yourself!)

My understanding is that the scite works by doing natural language processing/machine learning on full-text files. I’m interested to hear what other IGDORE members think of this ‘computer-assisted’ way of judging scientific validity? Would a supported/questioned flag change how you judge a paper’s quality?

PS. it’s also fun to search for your own papers on scite, hopefully none are in question :wink:

4 Likes

Has someone already found themselves on this site?

Yes, I think all of my publications are indexed on scite (here). Did you have any you couldn’t find?

My only two papers are in Russian. I think it doesn’t index them.

Very interesting initiative! I searched for my own papers. One of the papers had 10 mentions. Unfortunately, all the mentions were inaccurately classified as supporting / contradicting the findings: the mentions were made in a neutral manner (e.g. in the introduction sections), but scite.ai still classified them as supporting / contradicting. I understand that they put a figure (proportion? rating?) lin each category (mentioning/supporting/contradicting), but I didn’t find it to be very accurate in this case. Only checked one of my papers though, so it might work better than I think.

A similar but different initiative is Curate Science: https://curatescience.org/

Manual entries, no AI. They’re doing many different things but one of them is some sort of database listing and connecting replications to the original papers and trying to estimate how strongly we should rely on the findings.

Similarly do we have RepliCATS and the bigger Darpa funded project it belongs to.

https://replicats.research.unimelb.edu.au/

The bigger aim of that project (as I understand it) is to build an AI that can accurately predict the likelihood of a certain finding being successfully replicated. That would be a huge step in the right direction if they succeed

1 Like

The wonders of AI…

I think AI will be more accurate soon

I’m in a slack group with a founder of scite (Josh Nicholson) and asked how accurate it was and about the case of multiple citations from one paper to another. Response:

The precision for both supporting and contradicting is above 0.8 and for mentioning is ~0.99. We evaluate the model based on precision and recall but weight towards precision. Definitely difficult and will never be perfect, which is why we allow users to flag mis-classified citations so they can be manually changed.

We classify citation statements that are at the claim level so indeed a paper can support and contradict another paper. This also means we count citations differently as one paper can cite another 3 times (we take all three excerpts and classify them).

I didn’t actually know about Curate Science or RepliCATS before, but know a few people who have been talking about trying to predict the accuracy of trials replicating or otherwise judge the quality of as yet unreplicated studies. These all sound like good steps forward!

Hi Rebecca,

Thanks for the feedback. Would you mind linking to the report you’re referring to? Supporting and contradicting citations represent ~4% and ~1% of citations respectively so its curious that you found a paper with 10 citation statements that were all supporting or contradicting. Maybe our U is unclear? If so, would be great to understand so we can improve.

You can learn more about our classifications here: https://help.scite.ai/en-us/article/how-are-citations-classified-1a9j78t/ and if you come across errors, you can flag misclassified cites by following these instructions: https://help.scite.ai/en-us/article/how-do-i-fix-a-mis-classified-citation-7no39b/

Last, we’re well aware of the shortcomings of AI but in order to classify 430M cites (and growing), there really is no other way–we started scite doing it manually and it is our manual annotations that train the deep learning model. One thing we try to emphasize beyond the classifications themselves is that you can quickly see how something has been cited by looking at the context, which really helps assess literature in a new way I think.

Thanks again for the feedback!

Best wishes, Josh

1 Like

Hi Josh and welcome to the forum! :wave:

This is a very interesting project, especially for the different point of view it brings about looking at how citations are used. It could really help literature reviews as well as help identifying biased field niches.

My comment on AI comes from a skeptic but enthusiastic position on its use for these tasks that could never be accomplished manually (or with a huge and highly trained workforce). I have recently been involved in checking AI classification of scientific abstracts funded by the EU in the Horizon 2020 program. It was quite a disaster!

I am sure deeplearning algorithms will get better and better. I was wondering about a couple of things though:

  1. When somebody flags a misclassification, is this information integrated in the algorithm?
  2. Are you still running manual classification in parallel with the AI?

Cheers,

Enrico

Thanks for the follow up and more context. My answers below:

  1. When you flag a mis-classified cite it is reviewed by two independent experts (scite team members) and accepted or rejected. If accepted, the snippet will then say “expert classified” next to it instead of the model numbers that are displayed by snippets classified by the model. This data is indeed added to the training data as we make improvements to the model, which happens roughly once a month.

  2. We are running manual classifications to continually build the training data. In fact, we’re looking for researchers to help here and are happy to pay for their time ($30 per 100 snippets annotated). People will always be allowed to flag cites on the platform.

Let me know if you have any other questions, also we’re very open to feedback on new features or improving the UX, so happy to have your thoughts here too.

Best, Josh