At the beginning of my computer science PhD, I wrote a literature review and found it to be a challenging project. I’d like to share some of my insights from planning the review, sifting and organizing the material, and the challenges of this process.
Planning the Project
A literature review is a time-consuming project but it offers a lot of benefits. I suggest to first make a list of your desired and expected outcomes, ideally before reading mine:My Expectations (click)
- make sure my research is unique/new
- identify interesting problems
- get a feel for the composition of scientific articles
- identify conferences and journals for publishing
- build an author database to track new results in the area
- collect ideas for research questions (that didn’t work very well)
- find a niche research topic (that didn’t work very well)
Almost all of these expectations have been fulfilled — having seen a lot of papers in the field, I am now much more confident that what I’m doing hasn’t been done before. I also got a glimpse of the many variants of the problems I’m dealing with and the motivations behind them. One of the most valuable assets however is a getting a feel for how good, standard research in my field is conducted and communicated. This includes language and structure, but also research questions, approaches, and standard setups for evaluation.
What didn’t work so well was to generate a research problem from the literature — e.g., picking up some state-of-the-art approach and further develop it. However, that depends on you and your research area — I’m always astonished how different each PhD is, in terms of goals, focus, supervision, and a PhD’s personality.
So… What’s a Survey?
Before starting, you might consult a guideline or definition from a journal in your research area. For instance, the ACM Computing Surveys journal defines it as:
a paper that summarizes and organizes recent research results in a novel way that integrates and adds understanding to work in the field. […] it emphasizes the classification of the existing literature, developing a perspective on the area, and evaluating trends.
You should also consider writing a structured review, i.e., follow a methodological framework rather than achieving the goal using your own method. Look out for guidelines on structured/systematic reviews and meta-analyses in your research area. For instance, in software engineering,  is a much cited reference. In the life sciences, you may want to look into to the PRISMA  guidelines that provide a framework for systematic reviews that focus on randomized trials.
Finally, I highly recommend to learn from other surveys in your field. Reading and dissecting other reviews gave me a clearer idea of how the overall paper and its scope could look like. I picked about five well cited surveys from top journals and dissected their structure section by section, and in parts paragraph by paragraph. Abstracting from the content and looking at the lines of argument, differentiation of research areas, etc. was so instructive, I wish I had done it much earlier.
Sifting the Material
Obviously, for a literature review, you need to read. A lot. I’d first like to share some ideas on how to find literature and then give some tips on how to read it.
Exploring the Citation Graph
Starting from a relevant paper, I was looking for literature in three directions: ancestors, descendants, and siblings. Considering literature as a web of papers linked by citations, ancestor papers are those referenced by the paper, descendant papers cite the paper of interest, and sibling papers are other papers written by the authors or appeared in the same conference or journal.
Ancestors show the origins of an approach or point out alternative approaches to a problem. In the beginning, I found it extremely useful to read through the complete reference list of interesting papers. I would then stumble across seminal papers that almost all the papers on that topic cite. One of my first steps was to identify a set of half a dozen such core papers. Apart from being obvious candidates for reading, these can also be used as indicators of whether a paper is relevant to your research question. I found this especially useful in the beginning, where almost everything seemed somehow relevant.
Descendants may show improved solutions to a problem, contain comparative evaluation of approaches, or outline new, related problems. Some of the well cited descendants may also be survey papers or contain well written related work sections that you can leverage. I found google scholar alerts on new citations of important papers very useful to keep track of new (descendant) papers in the field.
I also found looking for sibling papers useful. First of all, looking through an author’s bibliography is likely to give you a more detailed view of the author’s specific research area. It also allows you to group lines of research to a core idea or approach. Skimming through the proceedings of conferences where an interesting paper appeared seems tremendously labor-intensive but gave me surprisingly good results. This also widens the view from a specific research question to a research domain. When going through bibliographies and proceedings, I found a multi-keyword highlighting browser plugin very useful (search for and highlight for several keywords in a page) but my one for Firefox recently deceased.
Practicing to Read
This may sound a bit ridiculous at first, but you will notice how your reading gets faster, more focused, and efficient over time.
One tip is to read actively and adaptively. Rather than suffering through the article from start to end, stay in an active role, pose questions to the article and search for the answers.
By reading adaptively, I mean spending more reading time on specific sections, depending on which stage your review is in and what a specific article has to offer. For instance, in the beginning you might be interested in the use cases reported by papers; later you’ll skim these sections, as you quickly identify the common arguments. Some paper’s related work sections are gold mines, other papers excel in structure, language, methods, etc.
Another thing to watch out for is not to get addicted to collecting. It’s very tempting to find, save, and print piles of papers to read “later”. However, don’t mistake being busy for being productive. Try to plan time slots for reading (and just reading, nothing else). Take rigorous breaks to let the ideas sink in and you’ll see how you come up with new ideas and connections.
Materializing your Reading Experience
Although you need to read a lot, it seems to be only half of the work. Although I was eagerly taking notes, filling out spreadsheets, drawing taxonomies and maintaining a seemingly convincing overall structure in my head, I found that it still was a very long way until the first complete draft. I’d like to share some tips on starting to write early and managing the complexity of a growing collection of papers.
Force Yourself to Write
I consider writing in two flavors. One is private writing, which is mainly about collecting ideas, posing questions, and summarizing contents for yourself. It gives the freedom needed for creativity. The second is public writing, which is about working out ideas rigorously and explaining them in a comprehensible and bulletproof way to others.
Private writing is essential to cope with the many ideas, thoughts, and questions you’ll encounter during the project. For instance, force yourself to write a brief summary for every paper you read. This will put you in an active role again by having to select and recombine the relevant parts of a paper from your specific point of view. This is also extremely useful when you revisit a paper after some time — my own summaries help me remembering the paper much better than the author’s abstract of the paper. Also consider that revisiting a paper is a frequent task, e.g., when you need to extract new information because your survey design changed.
My advice is to attempt public writing as early as possible. Writing is a valuable tool for thinking. It is so much more than materializing your ideas on paper. The process of writing for others requires to think an idea through, by ordering and prioritizing its components, stepping back, looking at it from all angles, assessing its properties, credibility, and relationship to other ideas. The moment you’re writing your first paragraphs intended for inclusion in the paper, you will be ten times as critical and observant, compared to your private writing. Note that I’m not talking about endless editing, rephrasing, and polishing. On the contrary, I recommend drafting paragraphs as bullet lists and to consciously switch between roles such as the expert (that’s when you write), editor, reviewer, and reader. You’ll find flaws in all of these roles and improve upon yourself in many regards, which makes it an exhausting and sometimes painful, but also very rewarding process.
I have 275 papers in the collection I created for the literature review. To manage this pile of papers, I used a mix of features from my reference manager (I use Papers , which is good, but not as polished as I’d expect from a commercial product). The main ingredients are taking notes, indexing papers, tabulating findings, and keeping an overview.
Taking notes was the most important part to manage the complexity of the task. This includes the aforementioned paper summaries as memory aids, but also open questions, and aspects of the paper that stood out to me. I also found it useful to note decisions, such as “excluded because…” so I could recap and validate my decisions when the criteria changed. I also started to make notes how I stumbled across a paper, e.g., “cited by X as a method for Y, criticized for…” or simply “found while searching google scholar for term Z”.
By indexing papers, I mean both adding keywords and rating them. To face the problem of concept drift (keywords may change their meaning over time) I also like to add a short note about the reason I assigned the keyword and the specific meaning in this context. Keywords come in handy later when using the Papers search engine, which allows full-text searches or simply listing all publication with a specific keyword.
Rating papers is a simple feature that allows to give a paper a score between 1 to 5 stars. This was surprisingly helpful, for instance because it allows sorting by quality and is a quick reminder when having to decide whether to revisit a paper or include or exclude it from the review.
I also maintained a spreadsheet, listing the relevant properties of the papers. Although I didn’t find myself looking up things in the spreadsheet very often (mostly because I’d either remember or could as well consult the notes in the paper), I found the process of designing it very instructive. It also gave me a boost in the direction of reading actively, as outlined above. In the end, each row would correspond to details of one publication. For example, one row looked like this:
- Mnemo: 17Cunha (publication year and first author’s name)
- ReferenceID: 58 (I have a linked spreadsheet listing the full reference)
- Goals: use cloud resources if own cluster is overloaded: “help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters”
- Input: Classical RM Metadata: User, Group, Queue, Requested Time […]
- Output: execution duration, queue time for local cluster (“on-premise cluster”)
- Method: similar to 07Smith; the execution duration estimate for the cloud is a function of the estimate for the local cluster (they propose two models)
- Method Category: Local Learning
- Data-aware: no
- Heterogeneity-aware: yes
One problem is that during the project, the scope and relevant characteristics changed gradually. Revisiting all papers to extract the missing information on every change was not feasible (nevertheless, I frequently revisited the papers that left an impression on me). On the other hand, if a characteristic turned out not to be applicable to all relevant papers, it helped to redefine the relevant characteristics, inclusion criteria, and scope of the paper.
Finally, to keep an overview of the material and the process, I was using three techniques. First, I maintained a priority reading queue (by indicating reading priority using a keyword from 1 to 5). So when I was accumulating papers with highest reading priority, I knew it was time to stop collecting and start reading (and helped to select what to read in my reading slots). To structure the papers, I liked to print a few dozens of them and then categorize them on the office floor into groups. Third, I kept a list of search queries, the search engines I submitted them to, and the relevant hits. This helped to get good coverage and keep an overview of what I had been searching for and what results I have been sifting.
Challenges and Conclusion
I found my literature review project challenging in three ways. Writing a good review requires a lot a knowledge, but it also takes some skills to structure that knowledge and write it down for a scientific audience. In addition, this process may take months during which there is little tangible progress and the amount of literature can sometimes be daunting.
My most important advice is to start drafting real paragraphs from day one. What’s the motivation behind your review? What is its scope?
After my first six months of collecting, reading, list-making, tabulating, and private writing I believed that the review was roughly set up in my mind and I just needed to write it down. When I started public writing, I noticed I wasn’t even close. It took me another three months and a lot of more deep thinking to arrive at a rough draft. Public writing forced me to think in a much more rigorous way and to clarify and double-check my assumptions.
All in all, writing a literature review gave me a tough first PhD year, but in the end, I feel thoroughly rewarded. I hope this wrap-up provides some help for those of you aiming to write a literature review themselves. My paper  is currently under review — so the story’s not over yet. Comments welcome!
References B. Kitchenham and S. Charters, “Guidelines for performing systematic literature reviews in software engineering,” EBSE Technical Report, University of Durham 2007.
 C. Witt et al., “Predictive Performance Modeling in Distributed Computing using Black-Box Monitoring and Machine Learning”, in review