November 9, 2021, by Digital Research
Creating UoN Research Link
In this blog, we talk about how we created UoN Research Link and some of the technical and design decisions taken.
What is UoN Research Link?
UoN Research Link is a pilot tool with support until 2024. It lets you search for University of Nottingham colleagues according to research interests and expertise. It can help researchers identify collaborators from across the University’s UK campuses. You can search for any topic of interest and it will return a list of colleagues working in related areas, together with a summary of each researcher’s interests.
The underlying data
Our first challenge was to create a database of researcher interests. Not wanting to burden colleagues with supplying the data manually, we decided to use eStaff profile pages, currently the fullest set of information about research expertise at the University.
Not all information on the profile pages is relevant to UoN Research Link. For example, we ignore contact details, teaching expertise, and publications. The Research Summary and Expertise Summary sections are the main ones we use. Colleagues without an eStaff profile page or with no information in the aforementioned two sections will not appear in UoN Research Link.
The data is cleaned and formatted, for example by removing duplicate profiles (some colleagues are affiliated with more than one School or Research Group) and deleting hyperlinks, special characters (like bullet points), professional titles and degrees, as well as common words like ‘university’ or ‘research’. This is done using the programming language R, which has good packages for text manipulation.
The connection between UoN Research Link and eStaff profiles is not live (i.e. changes to eStaff profiles won’t immediately carry over to UoN Research Link), but the data collection process will be re-run every three months over the lifetime of UoN Research Link. This means that changes to research expertise or the arrival of new research colleagues will be captured, albeit only every quarter.
The next challenge was to present researchers’ interests succinctly. Keywords are good for this, but few colleagues have separately listed keywords on their profile pages. Again, we did not want to burden colleagues by asking them to supply keywords, so we generate these automatically.
We tested different methods for generating keywords. We tried extracting the most frequently occurring words in a given profile, but that returned mostly banal terms. The same approach but after removing the 1000 most common English words was also disappointing. Models based on machine learning held more promise. Both BERT transformer-based models and topic modelling NLP models such as Gensim worked well and generated sensible keywords distanced from each other (it can be unhelpful when all of a researcher’s keywords are very similar). However, we found that both these approaches were inflexible in the number of words used as keywords. We wanted an approach that would produce both single words and collocations (and parse these accurately). The keyword ‘synthetic organic chemistry’ is more helpful than just ‘chemistry’ or, worse, the unintelligible ‘synthetic organic’. In our case, the model that produced the best results was taken from the Python Keyphrase Extraction toolkit.
Each profile in the database is thus supplemented with up to six keywords, generated automatically. These are displayed under colleagues’ names when they come up in a search, providing users with a quick overview of a researcher’s interests and expertise.
Despite its accuracy, the keyphrase generator still occasionally produces sub-optimal keywords (it struggles on hyphened terms, for example), and we realise that colleagues may also want to provide their own keywords. The site administrators can therefore manually update any researcher’s keywords. Colleagues can visit the Contact page to arrange this.
Suggesting related researchers
We also wanted to identify researchers with similar profiles. Colleagues may be working in related areas and not be aware of this. Researchers new to the University may wish to seek out connections. Prospective post-graduate students might be looking for complementary supervisors.
To achieve the ‘Related Researchers’ function, we use a proxy measure, namely the similarity of one profile text to all others. Using the R package quanteda, we were able to calculate this. Quanteda was used to tokenise the corpus and generate a document-feature matrix. It also has handy functions for removing stop words, word stemming and other NLP processes. We use the cosine similarity measure to compare researcher’s expertise texts, although other measures are available (Jaccard, Sørensen-Dice, etc.). Here is an example output:
Name School similarity
Gosling, Simon Geography 1
Grundmann, Reiner Sociology 0.378
Jones, Matthew Geography 0.338
Dugdale, Steve Geography 0.295
Panizzo, Virginia Geography 0.268
It’s worth emphasising that only the profile texts are being compared here – other contextual information is not considered. We also set a threshold on the similarity measure so only colleagues with reasonably similar profile texts appear.
The entire process for formatting the data, generating the keywords, and calculating the similarity measures represents about 1200 lines of code (90% in R, 10% in Python). This is a work in progress. By separating the database generation from the user portal, we can continue to refine the code over the lifetime of UoN Research Link.
Creating the tool and user testing
Before engaging a development partner to make the website, we worked with the University’s User Experience (UX) team. They helped us think about how to present the data and how users navigate databases successfully. This led to the introduction of various filters as well as the option to search not only by research topic, but also by researcher name. You can still see one of the clickable mock-ups we created here. Our development partner then refined the look and feel, particularly around the aesthetics of the site. Below, for example, are some ideas around toggle buttons that didn’t make the final cut:
Our partner also introduced the ‘try searching related topics’ option (which proposes only topics that actually occur in the database).
Two aspects of this work phase were especially positive and impactful:
1) Adopting an agile approach allowed us to make iterative changes as the site development progressed. This meant that issues could be addressed quickly and workloads could be managed. It allowed us to finish ahead of schedule and with no major setbacks.
2) User testing of an early prototype threw many of our assumptions into relief. The user-testing exercise was absolutely invaluable and made the tool much more intuitive and accessible than we had originally designed.
Working with the development partner and with colleagues from different parts of the University during the creation of UoN Research Link was a pleasure.
So how does it all work?
In the background, it’s quite basic. A user can enter any topic as a search term. The site will then look through the expertise text and keywords of every researcher in the database. If it finds a match for the search-for term or terms, then that researcher will be displayed. Each relevant researcher is shown on a tile that contains their keywords. Clicking on ‘Related researchers’ at the bottom right of the tile will reveal colleagues with similar expertise profile texts.
What got left out
Technical, financial, and time constraints meant we had to pare down or sacrifice some hoped-for aspects of the tool. For example, we had considered options for visually representing connections between researchers in a network graph. Ultimately this proved too ambitious (although the idea survives in the Research Link logo).
Presenting search results in a logical order also proved more challenging than anticipated. We explored options such as prioritising those researchers for whom the searched-for term appears more frequently in their profile text. But since these calculations would need to happen within the website or would rely on a huge database of all possible permutations of searched terms, it would have greatly slowed down the tool. In the end, we opted for a random-order presentation of search hits.
We had also wanted to include colleagues based at the Malaysia and China campuses, but were unable to obtain the data in time. For now, UoN Research Link only has information on colleagues working at our UK campuses.
UoN Research Link is a pilot tool with support through to 2024. We will be monitoring usage and welcome all feedback, both around the tool itself as well as any fruitful collaborations it may inspire!
Future developments may include the addition of extra functionality (such as that described in the previous section) or the integration of UoN Research Link into another system (such as RIS). We look forward to seeing what transpires!
A word of thanks
The development of UoN Research Link was supported by the University’s ESRC Impact Acceleration Account and 3DI. We are grateful to both.
Click here to visit UoN Research Link: https://researchlink.nottingham.ac.uk/