Automating Academic Reviewer Finding With Microsoft Academic and Google Apps Script

Matthew McKeever
13 min readFeb 1, 2020

Many people’s livelihood depends on publishing peer-reviewed papers in academic journals. Whether or not that’s a good state of affairs is another issue, but as a matter of fact, right now, there are many people out there anxiously waiting on long overdue decisions that may shape what their life will be like next year. It would be nice to make things easier for such people, as well as the reviewers and editors bound up in the current system.

One of the big problems with that system is finding reviewers. Editors are limited by their own knowledge and various publicly-available datasets (such as Google Scholar, Web of Science, and discipline-specific tools like philpapers.org or thephilosophypaperboy.com), and while especially these latter are useful, they are not tailor-made for reviewer finding.

My aim here is make something that is tailor-made for reviewer finding, that works by taking data from one publicly available dataset (namely that of Microsoft Academic, which is basically Google Scholar but Microsoft), extracting from it a list of possible reviewers, creating a database from that, and letting one query this newly created database to find reviewers for a paper under your editorship.

The idea is very simple — below I will mention some further additions I intend to work on in the coming months. But already this simple set up has helped me find reviewers for the journal I work for, Inquiry: An Interdisciplinary Journal Of Philosophy.

It works as so. In order to review a paper for a journal, typically one must have published on the topic, and moreover done so in another good reliable journal (where different people might understand different things by ‘good’ and ‘reliable’). Finally, it’s typically the case, in my experience, the people most likely to accept invitations to review papers have published in good reliable journals more recently.

So there are three things about a given paper’s author’s suitability to be a reviewer: the paper’s topic, where it was published, and when it was published. Now imagine you’re an editor and you have a pile of papers to find reviewers for. These papers often come with keywords that indicate the topic they are about. The idea behind what I’ll present here is that we look for matches between the keywords of submitted papers and the topics of a large database of papers, where these topics are represented by the title of the paper. We take all the papers in the world, filter them by date and journal, then try to match our submitted paper to papers in our database by looking for matches between the former’s keywords and the latter’s title.

As I say, simple. This probably won’t work in a bunch of disciplines that have more creative and unperspicuous paper titles, but in disciplines like philosophy where if, for example, there’s a theory called stage theory, and you have a problem to do with semantics for it, you might call your paper ‘A semantic problem for stage theory’. More generally, this might work in disciplines where keywords of papers appear frequently in titles. The additions mentioned above will hopefully extend its reach to other disciplines, and I welcome people who want to take a crack at adding to this basic idea.

What I offer here is a library, for Google Apps script, that allows you to perform the two functions mentioned above: making a database of potential reviewers, taking data from Microsoft Academic, and querying that database to find reviewers for submitted papers via matching keywords with titles. (It can also serve researchers to find papers on topics they are interested in, something that might be particularly valuable in disciplines (unlike academic philosophy) that don’t already have very good resources for doing so.)

Some may balk at using Google Apps Script, from two directions. More experienced coders might think this is a task better suited to a simpler and more familiar language, like Python, and indeed this is true. And less experienced coders might already be concerned that with talk of libraries and scripts we go beyond their comfort zone, and something completely easy to use is preferable. Again, true.

In my view, though, Apps Script is a pretty decent compromise: it’s powerful and lets you do a lot of stuff, so that experienced coders can easily add (and probably fix) what I offer here. But as I’ve set up it, it requires no coding on the user’s part, and — and this is a big one — no set up. Installing Python and the various dependencies is, if a mild one, a hassle, that might already put people off. By contrast, the set up time of what I’ll be presenting is probably like 15 minutes, and, moreover, completely platform independent (well, not quite: it’s entirely dependent on the google platform, but I’ll assume most readers are already dependent on it, so it’s not adding dependency). And finally, the way we’ll do things costs me nothing, either in money or in computing resources (this also explains why I didn’t use something like Web of Science, which requires a subscription. Apart from the moral offensiveness of paywalling this important data, I want this to be usable by people without the requisite institutional access (which includes, incidentally, me)). So I think the compromise is a good one.

What follows is a step-by-step guide to setting this up. Everything here — including this guide — is a work in progress, so please get in contact with mistakes, improvements, and so on.

Step 1: Get a Microsoft Project Academic Knowledge API key

Don’t worry if you don’t know what an API key is — it’s basically a way to authenticate yourself so that you can use code to automate the process of getting paper information from the database (as opposed to just using the search box at academic.microsoft.com, which you should do as well, if only to make sure that the journals you are interested in are in their database. I have no reason to doubt they would be, but you may as well check.) So go to mrs-apis.portal.azure-api.net, then click subscribe, then click Project Academic Knowledge, then click subscribe again, then click ‘Sign up now’, then confirm your email by replying to the email they’ll send you.

On the screen you’re taken to, press the subscribe button again, then project academic knowledge, then ‘Academic Search API’.

You’ll get a screen that looks like this, and click subscribe:

You should be taken to your user console. Click ‘show’ on your primary key, and take a note of it. This will be important. For the purposes of the rest of the document, pretend your primary key is ‘be5eb’. Your actual key will be much longer.

Step 2: Create a Google Apps script with the Reviewer Finder library.

I assume you have a google account and are logged in — if not, get one and do so. If you dislike and avoid google, let me know and I’ll share my Python code. Go to script.google.com and click ‘New Project’. Give your project a sensible name — let’s say Reviewfinder_yourname. Open a new tab, and go to https://script.google.com/d/1fhEpIzmQEMDHoMxITMjkL-K1H4m0v36Se6Yt-f7oCSOyLIwSGzjjXBts/edit?usp=sharing.

Actually, this step isn’t strictly necessary, but it shows you the library that you’ll be working with, the functions that let you read from the Microsoft database, and query your own database.

Now, go back to your project, i.e. Reviewerfinder_yourname. Click on the resources menu, then the library button, and you should see a screen like this:

In the box after ‘Add a library’, type the document id of the library. The document id of any google file is the very long alphanumeric string in the url. In this case, it is 1fhEpIzmQEMDHoMxITMjkL-K1H4m0v36Se6Yt-f7oCSOyLIwSGzjjXBts. On your screen should appear, under title, the library ‘ReviewerFinder’. Select version 7(at the time of writing the only available one), and click save. Now you’re done!

Step 3: Populate Your Database

(Note: if you work in philosophy, I have a(n incomplete) database I’m willing to share, consisting of five years of publications for about 10 generalist philosophy journals. The reason I haven’t done so here is I don’t know about the legality of it — can one publish datasets culled from publicly available sources? If anyone knows the answer, and it’s yes, tell me I will add a link to my database here and tweet it. I assume it’s fine but I want to make sure.)

So what we’re going to do here is make a google sheet, and then fill it with entries from the Microsoft Academic database. So go to docs.google.com, and hit the upper left dropdown menu to go to sheets, and create a blank sheet. Give it a sensible name, and copy down the document id (as before, it’s the long alphanumeric string in the url.) Let’s pretend that document id is ‘1ksksksks’. It is of course much longer.

Now go back to your script project, which we called Reviewerfinder_yourname. Think of a journal and a time frame, and how many results you want to get back. For example, let’s say I want to collect the author details for ten papers published in the journal Analytic Studies in 2019. Then this is what I type in between the braces {} which open after ‘function myFunction()’ and end at the end of the document:

ReviewerFinder.getFromAcademicKnowledge('1ksksksks','be5eb',"'analytic studies'", 2019,2019,10);

There are several weird things to note here. First, all entries in the MA database are in lower case, so everything you search for also must be in lower case. Second, note that for the journal name you have double quote marks then single ones. Third, if you want to search for only one year, you need to enter it twice. Fourth, all of these parameters are necessary; it would be better if at least the number of results were optional and defaulted to some number if omitted. All of these are small superficial untidinesses I just haven’t got round to fixing at the time of writing.

More generally, the syntax of the function is as so:

ReviewerFinder.getFromAcademicKnowledge(sheet_id,api_key, journal_name, start_year, end_year, number_of_results);

Where you replace ‘sheet_id’, ‘journal_name’ etc. with your sheet, the journal you want to look for, and so on.

Click on the play button, and a screen will pop up saying ‘Authorization Required’. Sign in. It might say that there’s something unusual about your activity and try to do some 2-factor authentication. Do so. You’ll then get this scary screen:

The app referred to here is your app, which is indeed not verified. My library isn’t verified either. Click on ‘Advanced’ and, if you dare, click on “Go to Reviewerfinder_yourname (unsafe)”. Then you should see:

The reason it says this is because the app you’re making will add data to the google sheet you made at the start of this step (that’s the first two points, I think), and connect to the Project Academic Knowledge API (the third point). If you click through, and press the play button, then go the just mentioned sheet, you should see something like this:

Data, beautiful data! Note that all this is made up — there is no journal called Analytic Studies, and all those authors and papers don’t exist (why? It just felt weird to use real people in this guide).

You will see that, for ten papers, it will probably often return more than 10 results. That’s because papers are co-authored, and what we’re really looking for are authors. Do another query, and the results should be appended to the bottom of the sheet. Now you can collect data to your heart’s content just by feeding different parameters to the getFromAcademicKnowledge() function.

Step 4. Querying Your Database.

Now, remember why we’re doing this — to find referees. We now have, or have the capacity to have, lots of potential reviewers. But we don’t want to look through them manually, and so we turn to the second function in our library, findInMyDB().

Before doing that, we need to make one more google document. This time it’s a text document. In this file, we’re going to be recording outputs — potential referees for files. As usual, get the document id, and let’s say that it is ‘2ekekekek’.

Now recall we want to match papers which need to be refereed with authors. And I’m afraid the way we’re going to do so is perhaps disappointingly simple. What we’ll do is take a list of keywords associated with the to be refereed paper, and look for them in the titles of the papers in our database. If they are there, then we’ll output the details of the matching papers in our output text file.

Now delete, or comment out the previous line (comment out by using //, so that it doesn’t run again). For learning sake, let’s make things easy for ourselves. Let’s say that we have a paper that contains the following keywords: ‘phenomenal conservatism’, ‘justification’, ‘evidence’. And let’s say that your submission software assigns it the id ‘PAPE-2020–0020’. Then what we do with findInMyDB() is pass it the id of the sheet that contains our database, the id of the document where our output will go, and an array of our keywords, where an array of keywords is those keywords, in quotation marks, separated with commas, and beginning and ending with [ square brackets. We also pass the paper id.

It’s easier to give an example. Assuming you’ve commented out rather than deleted the first function, you should have something like this:

//ReviewerFinder.getFromAcademicKnowledge(‘1ksksksks’,’be5eb’,”’analytic studies’", 2019,2019,10);ReviewerFinder.findInMyDB(‘1ksksksks’, ‘2ekekekek’, [‘phenomenal conservatism’, ‘justification’, ‘evidence’],’PAPE-2020–0020');

Where, as ever, the above are just dummy document ids, and you should put in the ones of your own document, which will be much longer. Click the play button again, then go your output document, and all being well you should see something like this:

Ta da! You can now bother this poor (and non-existent) fellow with a referee request. It’s not pretty, certainly, but it works and it’s easy and free and accessible to anyone, on any platform (and prettiness is, well, cosmetic, and can come later).

Future Work, Further Reading

This is really meant to be just a worked example of the sort of things one can do. More competent coders can make better and nicer things based on this idea; hopefully it will at least serve as inspiration.

Let me end by saying some things that I think should be added, and some resources to help anybody who might care to do so. Here is my working list:

(*) Find a way to impute keywords to papers in your database. Consider the paper we found. It is about a topic in epistemology, that branch of philosophy devoted to the theory of knowledge. It would be cool if we could add keywords to the database reflecting this, such as ‘epistemology’, ‘knowledge’, ‘belief’, and so on. I think this is doable, and figuring out how to do so sounds like an interesting, fun challenge (I intend to try to solve it for the case of philosophy by using freely available categorizing data from philpapers.org).

(*) Add a GUI, in case people use this apps script program. Although I haven’t done it before using apps script, I think it’s relatively straightforward. For those with some coding background, you can begin with tutorials here: https://developers.google.com/apps-script/guides/html. For more general information about how google objects (sheets, docs, scripts, etc.) are represented in and manipulable by Google Apps script, check out https://developers.google.com/apps-script/reference.

(*) Comment the code in the library, and deal with errors more gracefully. If you get a weird error message, double check that your function conforms exactly to the syntax given above.

(*) Add the capacity to search for inexact matches, so that if the keyword was ‘epistemology’, and the paper title contained ‘epistemological’, it would be a match.

(*) Write another simple function that takes as input sets of papers needing reviewers (say, represented in a google sheet, with columns for paper id and keywords) and outputs sets of sets of possible reviewers, to streamline the process. (Doing it one by one as here is needlessly slow; on the other hand, getting access to sets of papers needing reviewers is, at least for the software that my journal uses (scholarOne) not automatable. They say they have an API but after months of sending many people many emails I haven’t been given one. Your set-up might be more amenable to this, and if so you should definitely try to integrate something like this automatically into your workflow).

(*) Add more ways of querying the Microsoft academic database, of which I’ve only really skimmed the surface. Some relevant documentation is relatively helpful: https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/home. I recommend someone wanting to use this just to read the docs and start playing about.

For those with the background, please feel free to make a copy of and improve the library, and find some way to make it known to me, on here, or via any other of the communication mediums modern life affords us (email, twitter, linkedin, myspace, tinder, etc.) And please tell me things that don’t work either in anything presented above or in my presentation thereof and I will endeavor to fix them asap.

Added 23 June 2020: From summer 2020 I’m going to move my occasional writing from medium to tinyletter. If you want to read more from me in your inbox, please consider signing up: https://tinyletter.com/mittmattmutt. I’ll post relatively infrequently, and hopefully interestingly, on the same sort of themes as the blog, so: popular philosophy/explainers, culture, literature, politics/economics, etc. I might also do things like brief reviews of books I read and so on.

--

--