Автономная некоммерческая организация высшего образования
«Университет Иннополис»
ВЫПУСКНАЯ КВАЛИФИКАЦИОННАЯ РАБОТА
(БАКАЛАВРСКАЯ РАБОТА)
по направлению подготовки
09.03.01 - «Информатика и вычислительная техника»
GRADUATION THESIS
(BACHELOR’S GRADUATION THESIS)
Field of Study
09.03.01 – «Computer Science»
Направленность (профиль) образовательной программы
«Информатика и вычислительная техника»
Area of Specialization / Academic Program Title:
«Computer Science»
Тема /
Topic
SciLRtool: Онлайн инструмент, поддерживающий все этапы
систематических обзоров литературы в програмной
инженерии / SciLRtool: Online Tool Supporting All Stages of
Systematic Reviews in Software Engineering
Работу выполнил /
Thesis is executed by
Гинзбург Данил Маркович /
Ginzburg Danil
подпись / signature
Руководитель
выпускной
квалификационной
работы /
Supervisor of
Graduation Thesis
Конюхов Иван
Владимирович /
Konyukhov Ivan
Иннополис, Innopolis, 2021
подпись / signature
Автономная некоммерческая организация высшего образования
«Университет Иннополис»
ВЫПУСКНАЯ КВАЛИФИКАЦИОННАЯ РАБОТА
(БАКАЛАВРСКАЯ РАБОТА)
по направлению подготовки
09.03.01 - «Информатика и вычислительная техника»
GRADUATION THESIS
(BACHELOR’S GRADUATION THESIS)
Field of Study
09.03.01 – «Computer Science»
Направленность (профиль) образовательной программы
«Информатика и вычислительная техника»
Area of Specialization / Academic Program Title:
«Computer Science»
Тема /
Topic
SciLRtool: Онлайн инструмент, поддерживающий все этапы
систематических обзоров литературы в програмной
инженерии / SciLRtool: Online Tool Supporting All Stages of
Systematic Reviews in Software Engineering
Работу выполнил /
Thesis is executed by
Гинзбург Данил Маркович /
Ginzburg Danil
подпись / signature
Руководитель
выпускной
квалификационной
работы /
Supervisor of
Graduation Thesis
Силлитти Альберто /
Sillitti Alberto
Иннополис, Innopolis, 2021
подпись / signature
Contents
1 Introduction
9
1.1 Domain Area and Applicability . . . . . . . . . . . . . . . . . .
9
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . .
10
1.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . .
11
2 Literature Review
2.1 Systematic Literature Reviews . . . . . . . . . . . . . . . . . .
12
12
2.1.1
The Importance of Systematic Literature Reviews . . . .
13
2.1.2
Why Do a Systematic Review? . . . . . . . . . . . . . .
13
2.1.3
Differences of SLR from Conventional Literature Review
14
2.2 Systematic Reviews in Software Engineering . . . . . . . . . . .
14
2.3 Software Tool Supporting Systematic Reviews . . . . . . . . . .
15
2.3.1
What Can Be Automated? . . . . . . . . . . . . . . . .
16
2.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3 Methodology
3.1 Setting Up the Review . . . . . . . . . . . . . . . . . . . . . .
19
21
3.1.1
Define the Question Type (PICO, PIT, PO) . . . . . . .
21
3.1.2
Define if SR or SM Will Be Performed . . . . . . . . . .
22
3.2 Pi Scoping/Protocol Development . . . . . . . . . . . . . . . .
22
CONTENTS
3
3.2.1
Formulate the Review Question . . . . . . . . . . . . . .
22
3.2.2
Write Protocol . . . . . . . . . . . . . . . . . . . . . . .
23
3.2.3
Devise Search Strategy . . . . . . . . . . . . . . . . . .
24
3.3 Literature Searching . . . . . . . . . . . . . . . . . . . . . . . .
25
3.4 Duplicate Checking . . . . . . . . . . . . . . . . . . . . . . . .
26
3.5 Article Screening/Study Selection . . . . . . . . . . . . . . . .
28
3.6 Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . .
30
3.6.1
Planning . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.6.2
Conducting
. . . . . . . . . . . . . . . . . . . . . . . .
32
3.7 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.7.1
planning . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.7.2
conducting . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.8 Quantitative and Qualitative syntheses of results . . . . . . . .
35
3.8.1
Qualitative synthesis . . . . . . . . . . . . . . . . . . .
35
3.8.2
Quantitative synthesis . . . . . . . . . . . . . . . . . . .
36
3.9 Generation of Documentation . . . . . . . . . . . . . . . . . . .
38
3.9.1
Protocol Reporting . . . . . . . . . . . . . . . . . . . .
39
3.9.2
Final Review Reporting . . . . . . . . . . . . . . . . . .
40
4 Implementation
42
4.1 Technology Adaption . . . . . . . . . . . . . . . . . . . . . . .
42
4.2 Setting Up the Review . . . . . . . . . . . . . . . . . . . . . .
43
4.2.1
Define the Question Type (PICO, PIT, PO) . . . . . . .
43
4.2.2
Define if SR or SM Will Be Performed . . . . . . . . . .
43
4.3 Literature Searching . . . . . . . . . . . . . . . . . . . . . . . .
43
4.4 Duplicate Checking . . . . . . . . . . . . . . . . . . . . . . . .
44
4.5 Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . .
44
CONTENTS
4
4.5.1
Planning . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.5.2
Conducting
. . . . . . . . . . . . . . . . . . . . . . . .
47
4.6 Quantitative and Qualitative Synthesis . . . . . . . . . . . . .
50
4.7 Generation of Documentation . . . . . . . . . . . . . . . . . . .
52
4.7.1
Documentation Interface . . . . . . . . . . . . . . . . .
52
4.7.2
Publishing Evidence Synthesis . . . . . . . . . . . . . .
53
5 Evaluation and Discussion
5.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
56
5.1.1
Setting Up the Review and Protocol Definition . . . . .
57
5.1.2
Literature Searching . . . . . . . . . . . . . . . . . . . .
58
5.1.3
Duplicate Checking . . . . . . . . . . . . . . . . . . . .
58
5.1.4
Study Selection . . . . . . . . . . . . . . . . . . . . . .
59
5.1.5
Quality Assessment . . . . . . . . . . . . . . . . . . . .
59
5.1.6
Data Extraction . . . . . . . . . . . . . . . . . . . . . .
60
5.1.7
Data Analysis . . . . . . . . . . . . . . . . . . . . . . .
60
5.1.8
Generation of Documentation . . . . . . . . . . . . . . .
60
5.1.9
Publishing Evidence Synthesis . . . . . . . . . . . . . .
61
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
6 Conclusion
63
Bibliography cited
65
A Existing Steps For Systematic Reviews
70
B PRISMA documents
72
C SciLRtool improvements over Parsifal
74
List of Tables
I
Related tools . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II
SciLRtool improvements over Parsifal shortly described for each
stage with corresponding information formats. . . . . . . . . . .
18
75
List of Figures
3.1 Parsifal Tool: Review Details . . . . . . . . . . . . . . . . . . .
20
3.2 Parsifal: Research Questions and PICOC . . . . . . . . . . . .
23
3.3 Parsifal literature searching, search results have no links to the
articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.4 Parsifal duplicate checking. Two articles are identical, but the
word Disease is capitalized in the second article; thus, the article
will be marked as duplicated . . . . . . . . . . . . . . . . . . .
28
3.5 Parsifal study selection tool. Red lines and red text are added
to divide features logically. . . . . . . . . . . . . . . . . . . . .
30
3.6 Parsifal article details example . . . . . . . . . . . . . . . . . .
31
3.7 Quality Assessment Questions and Answers in Parsifal and Cadima 32
3.8 Conflict System Example: User 1 and User 2 answer the same
question for the same included study differently . . . . . . . . .
33
3.9 Parsifal data analysis; source-studies distribution example. 2
studies per source were chosen. . . . . . . . . . . . . . . . . . .
36
3.10 Parsifal data analysis; accepted vs rejected number of studies for
every source example. 2 studies chosen and 1 accepted per source. 37
LIST OF FIGURES
7
3.11 Parsifal data analysis; publication year example. Out of 3 accepted studies, 1 has 2009 pub.y., 1 has 2018 pub.y. and 1 has
2019 pub.y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.12 Parsifal reporting . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.1 Quality Assessment checklist new interface . . . . . . . . . . .
45
4.2 Quality Assessment checklist example of a new modal window to
add a new quality question with answers. . . . . . . . . . . . .
46
4.3 Quality Assessment checklist example of settings for main author. 47
4.4 Quality Assessment: conducting. . . . . . . . . . . . . . . . . .
48
4.5 Quality Assessment: manual reassignment of article. . . . . . .
49
4.6 Quality Assessment: conflicts example. . . . . . . . . . . . . . .
50
4.7 Interface for quantitative and qualitative synthesis . . . . . . .
51
4.8 New reporting stage, export tab . . . . . . . . . . . . . . . . .
53
4.9 New Browse navbar-menu . . . . . . . . . . . . . . . . . . . . .
54
5.1 The Likert scale applied in interviews . . . . . . . . . . . . . .
56
A.1 Existing steps for systematic reviews (possible to have some deviations) [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
B.1 PRISMA flow diagram template . . . . . . . . . . . . . . . . .
72
B.2 PRISMA report-assessment checklist . . . . . . . . . . . . . . .
73
Abstract
Systematic Literature Review (SLR) is a comprehensive literature review
summarising all available research relevant to a particular domain area; it is
applied to understand a domain and establish a possible domain gap. Consequently, some tools exist to support the process of conducting SLR’s. We
investigated that no existing tool provides support for all stages of SLR in the
Software Engineering area; thus, we decided to contribute to this field by creating a new tool called SciLRtool. Our tool combines best practices of such
tools as Parsifal and CADIMA and proposes its unique features. We evaluated
our system by interviewing 11 people experienced with SLR’s. According to
results, SciLRtool is estimated as "useful" in the practice of experts. However, the competitiveness of SciLRtool with regards to other tools is yet to be
estimated.
Chapter 1
Introduction
This document examines the domain of Systematic Literature Reviews in
Software Engineering and describes the implementation of a new tool supporting Systematic Literature Reviews - SciLRtool. Then it discusses the obtained
results and proposes future work.
1.1
Domain Area and Applicability
Many research works are available nowadays, which differ in quality, con-
tribution, and scientific value. It becomes crucial to identify the most relevant
research works for a specific problem. Often new research starts with a literature review. Nevertheless, it has little scientific value unless a literature review
is fair and thorough [1]. Recently, a new problem area raised that studies Systematic Literature Reviews (SLR). SLR is a secondary study that summarises
all available studies in a particular research area in a fair manner. It is helpful
to identify existing gaps in a research domain and examine a background to
propose a new research activity [1]. We discuss SLR’s in more details in the
Literature Review chapter 2
1.2 Problem Statement
10
Consider the following example. A team of post-graduate students study
the agile development processes. They want to obtain a comprehensive understanding of this domain. They can do a thorough literature review by themselves - read a large number of papers and select the most relevant works from
them. In contrast, they can utilize the existing Systematic Literature Review
for Agile Development Processes, and User Centred Design Integration by Salah
et al. [2] that summarises the most relevant works, including state-of-art solutions. This SLR gives a comprehensive overview of the domain area, groups
studies by pre-defined classes, provides quality assessment for every work presented and determines gaps in the domain area. In our example, the team of
post-graduate students can quickly recognize that Lack of Documentation is
the primary gap found by Salah et al. that needs further improvements.
1.2
Problem Statement
Clearly, there are some tools available that support researchers with doing
Systematic Literature Reviews. The most notable examples are: EPPI-Reviwer
4 [3] and CADIMA [4]. The tools provide automated solutions to different problem areas in SLR’s: effective team collaboration, protocol and report generation, duplicate checking, quantitative data representation and others. However,
some of them focus on concrete features and provide powerful functionality for
them (e.g. EPPI-reviewer facilitates the quantitative and qualitative synthesis
of data), while others are dedicated to specific domains (e.g. medicine).
From the thorough analysis of existing tools, we found that not a single
tool applied in the Software Engineering domain provides a solution to every problem area existing. The main reason is that the Software Engineering
1.3 Proposed Solution
11
domain drastically differs from the medicine domain [1] for which SLR was initially developed. It is also vital that Software Engineering is a considerably
new scientific domain.
1.3
Proposed Solution
In our project, we aim at developing a web tool supporting all stages
of Systematic Literature Reviews in Software Engineering. For this purpose,
we examined several existing tools to create a new one that combines the best
qualities of other tools. Therefore, we introduce SciLRtool - a tool based on the
Parsifal [5], which is solely made by Vitor Freitas that focuses on the Software
Engineering domain and provides open-source code. Parsifal features literature
searching and facilitates the quantitative synthesis of results.
We aim at creating our tool that features Parsifal and CADIMA best
practices and also proposes its unique features. We utilize the best solutions
from the CADIMA [4] - the giant in a world of SLR’s, which provides the ability
to develop SLR’s in any domain. CADIMA features the quality assessment of
research works and the generation of documentation with publishing documents
to be publicly available. We consider all stages and approaches of the SLR in
the Methodology chapter 3. In the Implementation chapter 4 we describe the
development process of the SciLRtool in details.
Chapter 2
Literature Review
This chapter describes the systematic literature reviews field, its applicability in the software engineering discipline and software tools designed to
support it. Section 2.1 and its subsections are dedicated to systematic literature review, its importance (2.1.1), difference from conventional literature
review (2.1.3) and reasons to undertake it (2.1.2). The second section 2.2 mentions the applicability of systematic reviews in the software engineering field.
The third section 2.3 explains the need for an automated process of systematic reviews, makes a brief overview of the review process and what parts of it
can be automated (2.3.1). Finally, section 2.4 discusses the existing tools that
support systematic reviews.
2.1
Systematic Literature Reviews
"A systematic review attempts to collate all the empirical evidence that
fits pre-specified eligibility criteria to answer a specific research question" [6].
A systematic review is a secondary study that summarises all available studies
in a particular research area.
2.1 Systematic Literature Reviews
2.1.1
13
The Importance of Systematic Literature Reviews
The start of every research is to examine some research area and write
a literature review chapter. However, if the literature review is not fair and
thorough, it is of little scientific value. This problem necessitates a systematic
approach to literature reviews for it to be fair. Such an approach is straightforwardly called Systematic Literature Review. SLR is fair and seen to be fair
because it requires researchers to follow a predefined protocol and search strategy. For example, the search strategy is formulated so that every reader of a
systematic review related paper must be able to assess the completeness of the
search. Most importantly, researchers that undertake a systematic review must
report all research that does not support their chosen research hypothesis as
well as reporting research that does. Otherwise, a systematic review is unfair
and considered to be pseudoscience.
"True ignorance is not the absence of knowledge, but the refusal to
acquire it."
Karl R. Popper, "In Our Time’s Greatest Philosopher Notes"
2.1.2
Why Do a Systematic Review?
There are many particular reasons to perform a systematic review. First
of all, to review and identify current and outgoing studies to indicate specific
gaps in knowledge and research area or lack of evidence. Secondly, to summarise the up-to-date evidence about specific methodology or technology; this
might be used, for example, to provide a background for those methodologies
or technologies to position a new research activity. Although writing systematic literature reviews is a highly time-consuming process, it is often rewarding.
2.2 Systematic Reviews in Software Engineering
14
They allow researchers to identify priorities for further research.
2.1.3
Differences of SLR from Conventional Literature
Review
The main difference between a systematic review and a conventional literature review is a review protocol that specifies the research question and the
methodology of performing a review. Furthermore, systematic reviews specify
particular search strategies so that readers can assess the completeness of the
search and replicate it if needed. Also, systematic reviews require inclusion and
exclusion criteria because not all the studies found by the search are helpful
for the research purpose. Besides inclusion and exclusion criteria, systematic
reviews are more flexible in terms of information extracted from the studies;
also, they specify quality criteria by which to evaluate the studies.
2.2
Systematic Reviews in Software Engineering
The systematic literature review is one of the primary methodologies of
Evidence-Based Software Engineering [8]. A systematic review is an evidencebased approach that originates from the medical field. However, the medical systematic review approach is not appropriate for software engineering researchers. The protocol for the software engineering field is well defined by
Kitchenham et al. (2007) [1] for systematic review process as we concentrate
on the software engineering field. Budgen et al. (2006) [7] conducted several
interviews with researchers to compare evidence-based approaches in different fields, and results showed that the agreement between clinical medicine
methodology and software engineering methodology is 0.17 [1]. This exper-
2.3 Software Tool Supporting Systematic Reviews
15
iment demonstrates how software engineering is different nowadays from the
medical area.
2.3
Software Tool Supporting Systematic Reviews
Software tools have been developed to support researchers during the sys-
tematic review process (they are also applicable for systematic maps, which are
similar to systematic reviews in terms of rigour protocol and search strategy;
however, they do not provide quantitive and qualitative analyses of the studies
[8]). Software tools provide increased efficiency for the reviewing team throughout the conduct of their review. Nevertheless, there appear potential downsides:
some tools are aimed at particular research disciplines (e.g. medicine) and are
not applicable for others. It is also possible they are not open-access. It is
worth mentioning that some software tools might be oriented solely on systematic maps and does not provide systematic reviews features.
Kitchenham et al. published an interesting document in 2008 that shows
the systematic review activity in software engineering from 2004 till 2008. In
this period, 20 systematic review related papers were published. However,
only half of them positioned themselves as related to evidence-based software
engineering [6]. Moreover, the number of studies done every year is steady,
and the quality is consistently improving. Although many researchers prefer to
undertake informal and manual literature reviews, the need for an online tool
supporting SLR is growing.
2.3 Software Tool Supporting Systematic Reviews
2.3.1
16
What Can Be Automated?
The review protocol can be split up into three main stages: Planning the
Review, Conducting the Review and Reporting the Review. The software tools
should deal with Conducting the Review and Reporting the Review stages. One
example of a systematic review conducted in the software engineering field that
follows the protocol specified by Kitchenham et al. (2007) [1] is Dina Salah
et al. (2014) [2]. However, the authors conducted their review manually, with
no use of external tools. It is clear how much hard work was done during
the review since the authors provide detailed manual explanations of the Data
Extraction/Synthesis methods and search results from digital libraries, conference proceedings and Journals. All those methods and search results can be
auto-generated by special tools such as the one explained in this document.
To understand which parts of Conducting the review and Reporting the
review can be automated, it is necessary to dive into the systematic review
process (see Figure A.1).
The process itself is partly technical and partly creative [9]. For example,
the creation of the research question(s) and the review protocol is a creative
task: that is the part of the review where a team of reviewers should utilize their
experience and creativity. Usually, peer-review is used to develop the protocol
to ensure objectivity and fulfilment of the review question(s) [6].
Once the protocol is defined, now it can be executed by a machine [10].
Tasks are ordered in such a way that manual tasks come first, and automated
tasks come second. It is also beneficial for reviewers to monitor and assure the
quality of the review during the execution of technical tasks. Some tasks are
impossible or seem to be impossible to automate. However, the development
of software tools is incremental [9], and what seems a fantasy now might be
2.4 Related works
17
implemented in a few decades.
2.4
Related works
The related work is based on related work published by Kohl et al. (2018)
[4] since it gives a complete and comprehensive overview of online tools available. This publication describes the new tool supporting systematic reviews
and systematic maps, which is called CADIMA. The authors did a great job
searching for existing solutions. Their search strategy includes:
• searches via online databases;
• searches via links in relevant websites;
• relevant publications searches.
Excluding tools that are not free to use, currently in development or no longer
available, 22 remaining software tools were identified. However, only 3 out of
22 tools are designed primarily for the Software Engineering field, and nine
are suitable for any research field. The rest is designed for medical science
and experimental animal studies and are not considered to be related tools
for this document. Thus, 12 remaining tools suit the definition of related tools.
Nevertheless, 5 of them are not available online, meaning they are downloadable
applications. Finally, we are left with a total of 7 similar tools in terms of
purposes and availability (see Table I).
2.4 Related works
18
Table I: Related tools.
name
field
stages
open source
CADIMA [4]
Any
Qu, Pi, Du, Sc, Co, Cr, Do
No
Colandr
Any
Pi, Se, Du, Sc, Co, Sy, Do
Yes
DistillerSR
Any Se (PubMed), Du, Sc, Co, Sy, Do
EPPI-Reviwer 4 [3] Any
No
Se, Du, Sc, Co, Cr, Sy, Do
No
PARSIFAL [5]
SE
Pi, Se, Du, Sc, Co, Sy
Yes
Rayyan [11]
Any
Pi, Se, Du, Sc
No
SESRA
Any
Qu, Pi, Sc, Co, Sy, Do
No
The tools differ in features they support, and, most importantly, in stages1
of SR. It is crucial to emphasize that none of them supports all stages. Some
of them concentrate on particular features such as machine learning during
screening, data extraction or synthesis stages and bias assessments. For example, EPPI-reviewer 4 [4] provides the article screening feature, and Distiller SR
[5] provides capabilities of different character sets management. Several existing solutions are not free to use (e.g. Distiller SR) and provide subscription
plans. Furthermore, all of the solutions are designed in English and provide
documentation (e.g. Rayyan [6] has only an online form as user support).
1
"Stages of a systematic review: Qu setting up the review, with question formulation and/or stakeholder
engagement, Pi scoping/pilot study, protocol development (e.g. PICO elements specified), Se literature
searching (e.g. via integration with publication databases), Du duplicate checking (e.g. automated marking
of duplicates, or identification of potential duplicates for manual checking), Sc article screening/study selection, Co facilitates data coding/tagging and extraction to support meta-analyses, Cr critical appraisal/risk
of bias assessments, Sy facilitates quantitative/ qualitative syntheses of results, Do generation of documentation/output of text, figures or tables to assist with report writing" from Kohl et al. (2018) Table 1 [4].
Chapter 3
Methodology
Among several publicly available open-source SLR tools (namely, Colandr
and Parsifal), we chose Parsfial [5] to be my starting point of SciLRtool. It is
a system supporting Systematic Literature Reviews, which is dedicated to the
Software Engineering field that follows Kitchenham et al. [1] protocol (see
Figure 3.1). It implements many features and supports many SR stages: Pi,
Se, Du, Sc, Co, Sy (see Table I).
Along with Parsifal, Cadima [4] supports the following SR stages: Qu,
Pi, Du, Sc, Co, Cr, Do (see Table I). The synthesis of those tools will result in
all existing stages of SR: Qu, Pi, Se, Du, Sc, Co, Co, Cr, Do. Furthermore, in
contrast with Parsifal, Cadima provides better usability: help text for features,
more flexible protocol setup and others. Therefore, SciLRtool is a synthesis of
these two tools and implements the best parts of each.
The list of all SLR stages (see Table I) (the stages marked as "done" are
already present in Parsifal) is as follows:
Setting up the review
3 Pi scoping/protocol development
20
Figure 3.1: Parsifal Tool: Review Details
3 Literature searching
3 Duplicate checking
3 Article screening/study selection
Quality Assessment
3 Data extraction
3 Quantitative and qualitative syntheses of results stages
Generation of documentation
In the following sections, we will discuss every stage in details.
3.1 Setting Up the Review
3.1
21
Setting Up the Review
The planning part of any Evidence Synthesis consists of Setting up the
Review and Protocol definition.
According to Cadima analysis, Parsifal lacks setting up the Review stage
(Qu stage). However, it is only partially accurate since Parsifal implements a
small number of features regarding this stage. Cadima defines the following
features of the Qu stage (marked features are present in Parsifal):
3 Invite registered users to become part of the review team
3 Define the title of the Review
Define the question type (PICO, PIT, PO)
Define if an SR or an SM will be performed
3.1.1
Define the Question Type (PICO, PIT, PO)
Parsifal allows users to define PICOC - population, intervention, comparison, outcomes and context. Nonetheless, other types of research questions exist
and require different approaches, such as PIT or PO. They could be applied in
different circumstances. For example, if a research question is related to the
accuracy of a test method, PIT (population, index test and target condition)
should be utilized. PO (population and outcome) the critical elements in case
questions are related to outcomes for a population [12].
3.2 Pi Scoping/Protocol Development
3.1.2
22
Define if SR or SM Will Be Performed
In contrast with Parsifal, Cadima provides tools for Systematic Mapping
Approach [8], which is a simplified way of doing a Systematic Review. The
systematic Mapping option allows users to skip some fields (e.g. Quality Assessment). This option is implemented in SciLRtool - it adds more functionality
to the system and expands the target user base.
3.2
Pi Scoping/Protocol Development
This stage is basically the preparation part of the review (see Figure
A.1). It consists of formulate review question, find previous SR, write
the protocol and devise search strategy tasks. Although those tasks are
required to undertake, only a few of them can be automated. Parsifal allows
users to record the review question, write the protocol and devise the search
strategy.
3.2.1
Formulate the Review Question
There are many possible ways to formulate the review questions [13].
Parsifal is a Software Engineering dedicated tool; therefore, the review topic is
constrained to the SE area. However, such factors as proficiency in the area
and personal interest are common [13]. Research questions must be explained
in detail to avoid ambiguity and help with the quality assessment stage. PICO
(population, intervention, comparison, outcomes) elements were recommended
as default specification of any research question [14].
Parsifal has an elegant way of writing research question and PICOC key-
3.2 Pi Scoping/Protocol Development
23
words(PICOC is an extended version of PICO with Context) (see Figure 3.2)
Figure 3.2: Parsifal: Research Questions and PICOC
3.2.2
Write Protocol
After formulating the review question and finding previous SR to establish
that the current SR is needed, the next step in planning the review is protocol
writing. This task requires expertise in the research area and creativity because
researchers need to have a general idea about the research outcomes. To ensure
unbiasedness and consistency of the review, peer review is used.
Parsifal implements the following features of writing the protocol:
3.2 Pi Scoping/Protocol Development
24
• objectives
• selection criteria
3.2.3
Devise Search Strategy
A good search strategy is not limited to only easily accessible studies.
It describes what keywords will be used in the searches, which databases will
be searched, how non-database sources will be tracked and checked for trustworthiness [6]. It is also a good practice to have a peer-review of the search
strategy before searching.
Parsifal provides the ability to write the following parts of the search
strategy:
• keywords and synonyms - users can specify keywords, synonyms, and how
they are related to PICOC.
• search string - user can define a search string using words, boolean operators AND and OR, parentheses to logically separate the keywords and
synonyms and double quotes for composite words.
• sources - user can specify databases (integrated databases: El Compendex, IEEE Digital Library, ISI Web of Science, Science@Direct, Scopus,
Springer Link) and other sources.
Overall, Parsifal has an excellent implementation of the protocol definition
stage; thus, SciLRtool has not changed it.
3.3 Literature Searching
3.3
25
Literature Searching
Mainly, online databases, such as Science@Direct or Digital Library, are
used nowadays. However, grey literature and other sources might be utilized
as well [15]. For the literature review to be systematic, all the relevant studies must be invoked; thus, multiple databases have to be searched. However,
interoperability among databases is relatively rare [9]. For example, different
databases may support different query languages (e.g. AND, OR and NOT),
the syntax for referencing specific fields and operations (e.g. ADJ or NEAR).
Bearing in mind the enumerated factors, researchers may struggle with Literature Searching.
Parsifal allows users to define a unique search string per source. In addition, Parsifal helps users search for scientific studies. It is integrated with
Elsevier: "Elsevier is a leader in information and analytics for customers across
the global research and health ecosystems" [16]. It provides an API for searching over 500000 articles annually in 2500 journals. Although it is not perfect
and it has good alternatives, such as AIP, IOP, or Springer [17], it still covers
a tremendous number of articles and provides fast and simple API endpoints.
Parsifal utilizes two APIs from Elsevier: Scopus API and Science@Direct API.
• Scopus is an abstract and citation database that includes trade publications, conference proceedings, patent records, peer-reviewed literature
and Web sites. It has the cited references of studies from 1996 forward
and provides author and article citation data [18].
• Science@Direct is a large bibliographic database that provides over 18
million pieces of content from more than 4000 journals, and 30000 ebooks from Elsevier [19]. Access to the full text requires a subscription;
3.4 Duplicate Checking
26
however, Science@Direct provides open access to some studies.
Since Parsifal is integrated with Scopus and Science@Direct, users can
find the most relevant studies to their research in one place, without using
other systems except Parsifal. However, a noteworthy drawback we discovered
is that the result returned by the search is plain text and is not clickable (see
Figure 3.3). Users need to find already found articles once more on the Internet.
Although Elsevier APIs provide links to the articles, Parsifal does not include
them in the search result.
Figure 3.3: Parsifal literature searching, search results have no links to the
articles
3.4
Duplicate Checking
The purpose of duplicate checking is to detect two separate reports of
the same study. This step is required to undertake whenever combining the
3.4 Duplicate Checking
27
obtained citations [20]. Duplicates appear due to variations of indexed metadata
(e.g. DOI, ISBN and page numbers might not be included) or typos (in the
article title or journal name).
In case the same study is reported more than once - due to variation in
author lists, titles or different journals - all those studies should be cited but
marked as one trial in meta-analysis [21]. Citation data are not enough to
detect such duplicates, so the article’s text is required.
To deal with duplicates, Parsifal has a mere duplicate checking engine. It
checks if some of the included articles have the same title based slug, which is
a detection mechanism for wrong cases (lowercase/uppercase), unnecessary or
wrong punctuation marks or extra white spaces (see Figure 3.4).
While detection of wrong cases is a required step to undertake to detect
duplicates, there should be some matching technique to detect typos and misspellings in article titles [20]. According to Elmagarmid et al. [20], there is a
vast number of such techniques, and they are split into the following groups:
Character-Based Similarity Metrics, Token-Based Similarity Metrics, Phonetic
Similarity Metrics and Numeric Similarity Metrics. However, Elmagarmid et
al. reference the work of Bilenko et al. [22], who compare the effectiveness of
different metrics and come to the conclusion that SoftTF.IDF metric, which
is a token-based similarity metric, works better than any other metric overall.
Although Bilenko et al. emphasize that no single metric is appropriate for all
data sets SoftTF.IDF has shown itself to be the best. Thus, SciLRtool achieved
it.
3.5 Article Screening/Study Selection
28
Figure 3.4: Parsifal duplicate checking. Two articles are identical, but the
word Disease is capitalized in the second article; thus, the article will be marked
as duplicated
3.5
Article Screening/Study Selection
This is the so-called appraisal stage that corresponds to tasks "screen
abstracts" and "screen full text" according to Tsafnat et. al. [9] (see Figure
A.1). These tasks aim to exclude all irrelevant studies. When the literature
searching part is done right, most commonly, the vast majority of articles are
removed [6].
In the first part of study selection (screen abstracts task), only titles and
abstracts exclude irrelevant studies. Usually, this part of the excluded articles
is the biggest. In the second part (screen full-text task), the entire text of the
articles (not excluded by the first task) is used to select studies.
Parsifal provides a uniform tool supporting these two tasks simultaneously. This tool has a great feature set, as shown in Figure 3.5. The figure is
divided into logical blocks from 1 to 6:
3.5 Article Screening/Study Selection
29
1. This block allows sorting by chosen sources. Default is the All Sources
that includes aggregated articles from all sources.
2. The second block allocates the button which opens a modal dialogue
to find and resolve duplicates (see subsection 3.4), and a button which
exports articles in .slx format in the form of a table with all relevant fields
(see item 6 for more details).
3. The third block allows choosing the following actions to perform on articles: mark as accepted, mark as rejected, mark as duplicated and remove
selected. Then the chosen action will be performed after clicking on the
"Go" button.
4. The fourth block allows sorting of articles by their status: accepted, rejected, unclassified and duplicated.
5. The fifth block allocates the table of articles only with the most important
subset of fields. Users can select/deselect all articles and select/deselect
a particular article to perform some action. It is also possible to sort the
articles by any field.
6. The final block appears with a more detailed configuration whenever an
article is clicked (see example in Figure 3.6). It allows editing all the meta
fields of the article: status (e.g. accepted, rejected), selection criteria (either inclusion or exclusion criteria predefined in the protocol, applied to
the article), title, abstract, year (publication year), author, keywords,
author keywords, BibTex key, Journal, Document Type, pages, volume,
DOI, URL, Affiliation, Publisher, ISSN, language and note. It is also possible to leave comments for the article, open the article’s URL by clicking
3.6 Quality Assessment
30
on the upper-right corner button "External link", go to a previous/next
article according to the table ordering and save the edited article.
Figure 3.5: Parsifal study selection tool. Red lines and red text are added to
divide features logically.
Parsifal’s implementation of the study selection stage is satisfactory and
does not require further improvements in SciLRtool.
3.6
Quality Assessment
From the Cadima analysis, Parsifal does not have a Critical appraisal
stage (Cr stage). However, it is not thoroughly true because Parsifal has a
practical implementation of this stage, but it is rather weak (Parsifal uses the
term Quality Assessment instead of Critical Appraisal). In contrast, Cadima
has more flexible and advanced settings, but it does not implement the Quality
Assessment itself. In other words, Cadima has the best Quality Assessment
in terms of Planning and Parsifal has the best Quality Assessment in terms
of Conducting. As already mentioned, SciLRtool aims to implement the best
parts of each.
3.6 Quality Assessment
31
Figure 3.6: Parsifal article details example
3.6.1
Planning
To define Quality Assessment Checklist, users need to create Quality Assessment Questions and corresponding Quality Assessment Answers; together,
they are crucial to a Systematic Literature Review (Systematic Mapping does
not require this stage) [23]. Parsifal has a simplified way of defining QA Questions and QA Answers: users can define Questions and Answers (a set of Answers is applied to every Question). However, Cadima defines a separate Answer
set for every Question, which is a significant structural difference. Moreover, it
greatly expands the system’s flexibility (see Figure 3.7).
In addition, Cadima provides an excellent possibility for the main author
(review coordinator) to nominate other team members to be involved during
3.6 Quality Assessment
Figure 3.7:
Cadima
32
Quality Assessment Questions and Answers in Parsifal and
Quality Assessment, which allows splitting the work between team members.
The key features implemented are outlined below:
• Users can set each Quality Question to have its own set of Answers.
– Users can copy the existing set of Quality Answers to a new Quality
Question.
• The main author can nominate (either manually or automatically) team
members to be involved during Quality Assessment.
• Users nominated by the main author for assessment can assess the corresponding included studies.
3.6.2
Conducting
Parsifal allows a group of researchers to assess the included studies concurrently. Nonetheless, different team members may assess the same study, and
they have some disagreements about a particular QA Question and its Answer.
3.7 Data Extraction
33
Then the conflict system is required. The conflict system provides a great option
to deal with conflicts (whenever a QA Question has been answered differently
by multiple persons). Besides creating conflicts, the system should have the
ability to resolve them (see Figure 3.8). Thus, the new features implemented
in SciLRtool are:
• Create conflicts
• Resolve conflicts
Figure 3.8: Conflict System Example: User 1 and User 2 answer the same
question for the same included study differently
3.7
Data Extraction
Data extraction is the determination of primary information in the text
of articles. It is one of the most time-consuming steps of systematic literature
reviews. Often, the relevant information is placed in graphs, tables or images,
and the information should be extracted as accurately as possible. Usually,
3.7 Data Extraction
34
two researchers perform the extraction and then resolve conflicts [9]. The automation potential of this task is low. However, it is still possible to partially
automate data extraction [9]; for example, ExaCT is the algorithm that highlights the most relevant information automatically, which helps in reducing the
text size and thus saving time for performing the extraction task [24].
Nevertheless, to automate data extraction, the text of articles is required,
but Elsevier provides full text of articles only by subscription. Parsifal is a
non-sponsored project; thus, it uses free APIs and does not access the text.
SciLRtool is also a non-sponsored and research-oriented project, and it inherits
this problem from Parsifal.
Although Parsifal can not automate data extraction, it does help to extract information by providing a user-friendly interface. Parsifal logically divides data extraction into two parts: planning and conducting.
3.7.1
planning
A reviewer aims to define which fields of studies will be extracted in
conducting part and what are their types (integer, float, string, boolean, date,
select one field and select many fields).
3.7.2
conducting
Respectively, in conducting part, a user is to extract data by hands and
write it into the respective fields of each article. Additionally, Parsifal can mark
articles as done or undone; it can also sort articles by done/undone markings.
Finally, when the data extraction task is finished, users can download an XLS
file with a table of the extracted data.
3.8 Quantitative and Qualitative syntheses of results
3.8
35
Quantitative and Qualitative syntheses of
results
The synthesis of results is one of the essential stages of SLR. The synthesis
will lead to the SLR objective - analyze the current state of the research area
and identify gaps.
3.8.1
Qualitative synthesis
Qualitative or narrative results, such as population, intervention, comparison, outcomes, context (PICOC), sample sizes, and study quality, should
be presented in a manner consistent with the review question. Tables should be
organized to show the differences and similarities between study outcomes. It
is crucial to determine whether outcomes from studies are consistent with one
another (i.e. homogeneous) or inconsistent (e.g. heterogeneous) [1].
Parsifal neither automates qualitative synthesis nor provides any interface
to support it. CADIMA supports neither quantitative nor qualitative synthesis,
but CADIMA provides an interface to upload reviewer’s files corresponding
to synthesis. This approach is expected since automating Data synthesis is
somewhat hard and currently beyond the capabilities of any available ML and
NLP tools [25]. Moreover, according to Shelby and Vaske [26], analysis depends
on the personal opinions of the reviewer, reviewers proficiency in the research
area and study purpose. It becomes transparent that qualitative synthesis is
highly dependent on the reviewer’s team. Thus SciLRtool, following CADIMA’s
recipe, intends to allow reviewers to upload their synthesized data in the form
of a DOCX file.
3.8 Quantitative and Qualitative syntheses of results
3.8.2
36
Quantitative synthesis
Additionally, according to Kitcheman et al. [1], quantitative information
should be presented in the form of tables as well; this includes:
• Intervention sample size.
• Intervention effect size with errors.
• Intervention mean values and confidence interval for the difference between mean values.
• Effect units used for measuring.
Parsifal implements quantitative synthesis of data, such as publication year,
source-studies distribution and accepted vs rejected number of studies for every source in a form of interactive figures (see Figures 3.9, 3.10 and 3.11 as
examples).
Figure 3.9: Parsifal data analysis; source-studies distribution example. 2
studies per source were chosen.
3.8 Quantitative and Qualitative syntheses of results
37
Figure 3.10: Parsifal data analysis; accepted vs rejected number of studies
for every source example. 2 studies chosen and 1 accepted per source.
Figure 3.11: Parsifal data analysis; publication year example. Out of 3 accepted studies, 1 has 2009 pub.y., 1 has 2018 pub.y. and 1 has 2019 pub.y.
While these figures are helpful, Parsifal lacks the feature of exporting
them in a file format in the case reviewers wish to put figures in their report.
Moreover, SciLRtool is indeed keen to fix this, allowing users to export figures
in PNG format.
Furthermore, PRISMA (Preferred Reporting Items for Systematic reviews
and Meta-Analysis) Statement [27] provides a four-phase flow diagram. It aims
at improving the quality of systematic reviews and quantitative synthesis of
results (see Figure B.1).
To generate such a diagram, SciLRtool needs to know how many studies
are included in quantitative synthesis and qualitative synthesis (the remaining
numbers can be obtained from the system database). As automating this task
3.9 Generation of Documentation
38
is currently impossible, reviewers should enter those numbers by themselves and
provide data synthesis files. When the system has all the numbers, SciLRtool
users can download the flow diagram with all the numbers arranged accordingly.
3.9
Generation of Documentation
Parsifal supports reporting stage - users can download a report that in-
cludes selected stages or steps chosen by a reviewer. Instead of dividing report
types, such as protocol, reference list, selection criteria, final review and others,
Parsifal allows its users to toggle the stages and steps that would be exported
in the DOCX file (see Figure 3.12). This approach is a simple and elegant way
of doing a report that is transposed to SciLRtool.
Figure 3.12: Parsifal reporting
However, Moher et al. [28] propose a guideline for protocol and review reporting. The guide enumerates essential aspects and steps the researcher should
complete for a report to be comprehensive. After comparing the proposed steps
3.9 Generation of Documentation
39
and the Parsifal features, it was found that some steps were missing. The missing steps are grouped into protocol reporting and final review reporting groups.
3.9.1
Protocol Reporting
The enumerated below steps belong to protocol reporting and are identified as missing in Parsifal and integrated into SciLRtool:
• Background. The background explains why the user’s study is important and how it can contribute to the field. It describes the role
of commissioners and other stakeholders; then, it logically leads to the
study’s primary question. The background is different from the Project
Description stage (which is already present in Parsifal), where a formal
declaration of the project is written.
• Search Strategy. Draft of search strategy that will be used for at least
one electronic database, including planned limits. Search strategy should
be transparent, such that it could be repeated.
• Scoping Exercise. Estimation of the comprehensiveness of the search.
• Study Inclusion Criteria Rationale about the study inclusion/exclusion criteria.
• Study Screening Mode. How inclusion/exclusion criteria will be applied.
• Quality Assessment. How studies will be assessed.
• Quality Assessment Mode. How quality assessment question will be
3.9 Generation of Documentation
40
applied, and how many team members will be involved during the appraisal.
• Data Extraction Strategy. How the data from included studies will
be collected and recorded.
• Data Analysis. How the collected data will be analyzed and synthesized.
3.9.2
Final Review Reporting
The following steps that belong to final review reporting also were implemented in the reporting stage in SciLRtool:
• Quantitative synthesis result document. See subsection 3.8.2.
• Qualitative synthesis result document. See subsection 3.8.1.
• Competing interests and sources of support. Financial and nonfinancial competing interests.
• Timeline.
• Author’s contribution.
• Acknowledgements.
• Appendices.
Once all the stages and documentation is complete, reviewers can download the final report and check its completeness. To support assessment of the
final review, PRISMA Statement [27] provides with document that consist of 27
3.9 Generation of Documentation
41
item-checklist (see Figure B.2). This report assessment document is available
to every SciLRtool user to ensure comprehensive reports. Whenever researchers
finish their work and proceed with report writing, SciLRtool advises them to
look through this document.
The other important part of the reporting stage is to make synthesis
results publicly available (i.e. displaying the evidence synthesis on the website).
In such a case, every user can see its title, authors, approach (i.e. SLR or SM)
and links to download every part of the report. Public availability intends
to increase the transparency of evidence synthesis. This approach is utilized in
CADIMA but is absent in Parsifal. Therefore, such a system has been developed
in SciLRtool.
Chapter 4
Implementation
This chapter explains in detail the developed system SciLRtool, which
is an extension of Parsifal. For convenience, the sections are aligned with the
methodology chapter, except the first one, which is an introductory section, and
sections that do not require improvements. A table II provides an overview of
SciLRtool’s improvements over Parsifal.
4.1
Technology Adaption
Initially, the Parsifal original code was rewritten from outdated and un-
supported versions - Python v2.7 and Django v1.8.3 to the newer ones - Python
v3.8.5 and Django v3.1.3 to meet modern standards. The adaption process required the following steps:
• Correcting syntax
• Finding alternatives for outdated and unsupported dependencies
• Finding alternatives for outdated features
4.2 Setting Up the Review
43
• Adapting new Elsevier API changes
4.2
Setting Up the Review
4.2.1
Define the Question Type (PICO, PIT, PO)
An HTML select tag with a submit-type button is added to support a
user’s choice of a question type. Whenever a question type is changed, users
will see the appropriate input fields, e.g. when a user changes PICOC to PIT,
the one will only see P, I and T input fields.
4.2.2
Define if SR or SM Will Be Performed
Another HTML select field is added in the evidence synthesis definition
stage. Users can choose from "Systematic Literature Review" and "Systematic
Mappings". Since some stages can be skipped when a user decides to perform
SM, red asterisks * were added beside some stage titles to indicate that the
user can not skip them.
4.3
Literature Searching
Elsevier API endpoint was adapted to the new API specification. More-
over, after retrieving search results from Elsevier API, links to documents and
their citations were added.
4.4 Duplicate Checking
4.4
44
Duplicate Checking
SoftTF.IDF - a token-based similarity metric was utilized from the
py stringmatching library and integrated into a duplicate detection algorithm.
This metric requires a threshold parameter t. An experiment was conducted
to identify the optimal threshold parameter - ten sentences with the number
of words from 10 to 15 were taken, then one synthetic misspelling was added
for each sentence. It was discovered that to detect a typo or misspellings in a
lengthy article title (from 10 to 15 words) and mark it as a duplicate, t should
be equal to 0.85.
4.5
Quality Assessment
A new approach to quality assessment is described in the methodology
section 3.6, and a new interface is designed to support new features of this
approach.
4.5.1
Planning
First of all, database tables QualityQuestion and QualityAnswer were
changed. If previously they had no relation entirely, now QualityAnswer has
a "question" field, which is a ForeignKey relation to QualityQuestion table so
that every question has its own set of answers (see Figure 3.7). Figure 4.1
illustrates a new interface of the quality assessment checklist.
4.5 Quality Assessment
45
Figure 4.1: Quality Assessment checklist new interface
For the researcher’s convenience, now questions can alter their relative
position in a questions list.
Furthermore, when a user clicks the "edit" or "Add Question" buttons
are clicked, a new modal window appears (see Figure 4.2).
4.5 Quality Assessment
46
Figure 4.2: Quality Assessment checklist example of a new modal window to
add a new quality question with answers.
In this window, users can add a name, unlimited number of answers and
corresponding weights to a new question. It is also possible to add an existing
answer set to a new question, which is appropriate when a new question has a
similar or the same answer set as a previously defined question.
Secondly, a new settings bar was added for main authors (i.e. creators
of evidence synthesis) to support team members nomination for the article’s
assessment. The main author can allocate included studies to different team
members, including themselves (see Figure 4.3).
4.5 Quality Assessment
47
Figure 4.3: Quality Assessment checklist example of settings for main author.
In the planning stage, the number of included articles is still unknown.
Thus, it was decided to use a percentage of the total number of articles as the
nomination amount. SciLRtool warns a user in case a sum of percentages is
less than 100%. However, the sum might be greater than 100%, as the system
assumes that a peer-review will be conducted.
4.5.2
Conducting
To meet the new quality assessment checklist features, we redesigned the
conducting part of the quality assessment (see Figure 4.4).
4.5 Quality Assessment
48
Figure 4.4: Quality Assessment: conducting.
Beneath the quality assessment title, information text shows how many
articles the main author has allocated for a particular team member. Exclusively for the main author, automated and manual assignment of articles are
available.
• Automated assignment. Only the main author can see an "Articles
assignment" button under the information text. It allows team members
to perform cycled assignment of articles. For example, if 70% of articles
allocated to user1 and 70% to user2, then the first 70% of randomly sorted
articles are assigned to user1, and the remaining 30% + first 40% to user2,
i.e. 40% of articles will be peer-reviewed.
• Manual assignment. Every individual article can be reassigned to another user or even be left without assignment. Beneath every article,
there is a list of assigned users for this article and a link for manual reassignment. The list will be coloured in green if the currently logged-in
4.5 Quality Assessment
49
user appears in this list; otherwise, it will be red. When a user clicks
the "reassignment" link is clicked, a modal window appears where main
authors can conduct reassignment (see Figure 4.5).
Figure 4.5: Quality Assessment: manual reassignment of article.
When the main author finishes their assignment, team members can filter articles by their assignment; the available filters of the HTML select tag
"assigned to" are: me, all, unassigned and the list of all other users except the
current one.
Another noteworthy feature developed is the Conflict System described
in 3.8. When two or more users create conflicts, they will be displayed in the
"Conflicts" tab (see Figure 4.6) will display them.
4.6 Quantitative and Qualitative Synthesis
50
Figure 4.6: Quality Assessment: conflicts example.
Only articles with questions that produced conflicts are displayed. Besides every answer to a conflicted question, the list of authors who chose it
is displayed. Every team member of a team can resolve conflicts by selecting
appropriate answers and clicking a corresponding "Resolve" button. Another
way to resolve a conflict is if some team members change the conflicting answer
so that all the answers to one particular question agree.
In case of many conflicts, filtering by "all" and "mine" conflicts were
implemented.
4.6
Quantitative and Qualitative Synthesis
In the reporting stage, users can upload their quantitative and qualitative
synthesis files in DOCX, TXT, JPG, JPEG, PNG, BMP and GIF formats up to
1Gb size. The system will include those files in a final report. Along with files,
users can specify the number of articles used in quantitative and qualitative
synthesis so that an integrated flow diagram by PRISMA will be generated (see
Figure 4.7). The generation of flow diagram consists of several parts:
• Extract all relevant numbers needed for the diagram from the database.
4.6 Quantitative and Qualitative Synthesis
51
• Upon existing flow diagram template in a format of a PNG image, put
numbers beside corresponding arrows of the diagram using Pillow library.
In the data analysis tab in the conducting stage, beside every generated
figure (see Figures 3.9, 3.10 and 3.11) new button "Export as PNG" added, so
that users can download those figures for later usage.
Figure 4.7: Interface for quantitative and qualitative synthesis
4.7 Generation of Documentation
4.7
52
Generation of Documentation
4.7.1
Documentation Interface
Many textual fields were added in the planning stage - protocol reporting and reporting stage - final review reporting. In the planning stage, a new
tab called "Documentation" was created with the following text fields: Search
Strategy, Scoping Exercise, Study Inclusion Criteria, Study Screening Mode,
Quality Assessment, Quality Assessment Mode, Data Extraction Strategy and
Data Analysis. Every field has a round button with a question inside it; whenever a user clicks it, supporting information is provided to users according to
one particular field.
In the reporting stage, a new tab called "Final Review Documentation"
was produced. Along with quantitative and qualitative synthesis fields (see
section 4.6), we added Competing Interests and Sources of Support, Timeline,
Author’s Contribution, Acknowledgements and Appendices fields were added.
Appendices is a file field that accepts DOCX, TXT, JPG, JPEG, PNG, BMP,
and GIF files up to 1Gb size. Only the Competing Interests and Sources of
Support field has the help button with supporting information since other fields
are self-explanatory.
In the reporting stage in the "Export" tab, new toggle tags were added
following the new features (see Figure 4.8).
4.7 Generation of Documentation
53
Figure 4.8: New reporting stage, export tab
Compared with Parsifal’s reporting (see Figure 3.12), it now has much
more toggle fields. A bug when some stage has all fields toggled off, but the
title of the stage appears in the report anyway, has been fixed.
4.7.2
Publishing Evidence Synthesis
The dropdown "Download" button from the bottom of the reporting stage
previously had the only option ".docx". The option’s name was changed to
"report," and the new option "Self-assessment checklist" was appended. When
4.7 Generation of Documentation
54
a user clicks the "Self-assessment checklist" option, the system will download
PRISMA’s reporting assessment checklist in DOC format.
The "Publish" button now allows users to make their evidence synthesis
publicly available. It appears in the new "Browse" navbar menu of the SciLRtool web site whenever it is published. This menu is dedicated to published
systematic reviews and systematic mappings (see Figure 4.9).
Figure 4.9: New Browse navbar-menu
The table of published evidence synthesis has the title, authors, approach,
last update and download columns. The system displays every authorEvery
author is displayed along with a link to the author’s profile page. Clicking
the "Download" button will display a list of options; every option specifies
what part of the evidence synthesis will be downloaded. The options are the
following:
• Review Description. This option includes the title, authors and description.
4.7 Generation of Documentation
55
• Planning. The planning option includes all the fields specified in the
planning stage, excluding protocol documentation.
• Protocol. The protocol option includes review description, planning and
protocol documentation.
• Conducting. The conducting option includes Source Search Strings,
Number of Imported Studies and Flow Diagram.
• Data Analysis. The data analysis option includes only Quantitative
and Qualitative synthesis files.
• Data Extraction Sheet. The data extraction sheet option includes the
outcome of the data extraction step.
• Study Selection Outcome. The study selection outcome option includes the outcome of the study selection step with inclusion/exclusion
criteria, source, publisher and others.
• Selection Criteria. This option introduces only inclusion/exclusion criteria.
• Full Report. The full report option includes everything except the data
extraction sheet and study selection outcome.
The table of published evidence synthesis is also visible in a profile page
of authors but is limited to only those evidence synthesis where the author has
contributed. The table appears beneath the table "Work in progress", which
is not visible to external users. If some author opens their page, (s)he will see
both tables and the links to evidence synthesis in both tables.
Chapter 5
Evaluation and Discussion
5.1
Evaluation
To evaluate the developed system, we conducted several individual inter-
views with people who conducted Systematic Literature Reviews. Beforehand,
we created and specified a list of questions and validation criteria. We applied
the Likert scale [29] as our validation criteria. Figure 5.1 shows the questions
and their weights of the Likert scale.
Figure 5.1: The Likert scale applied in interviews
The list of questions is the following:
1. What experience is SLR you have?
2. In what domains have you conducted SLR’s?
5.1 Evaluation
57
3. Have you ever used any tools supporting SLR’s?
4. In your opinion, what features such tools should possess?
The purpose of those questions is to understand an interviewee’s experience in SLR’s. The last question serves the purpose of gaining new ideas and
inspirations for future work. If we notice that the interviewee does not possess
enough knowledge or experience in the SLR domain (e.g. interviewee have only
read a couple of SLR’s but did not conduct it), we terminate our interview. After asking questions, we demonstrate SciLRtool to an interviewee. We explain
every stage in details and ask them to evaluate it on the Likert scale.
The interviews took place in Russia, Innopolis city, Innopolis University.
We interviewed eleven people, where 3 are professors, 6 are master students,
and 3 are bachelor students. The message inviting people to conduct interviews
was sent via an email to our University colleagues. It explicitly describes the
interview procedure. It also asks only people with SLR experience to respond
to this email.
To obtain the general feedback of a concrete stage of SciLRtool, we take
the average of interviewees’ results. The final feedback of a stage then appears
on the range from -2 to +2. The following section discusses the results of every
stage and new ideas we received from interviewees.
5.1.1
Setting Up the Review and Protocol Definition
• Result AVG 0; Neutral
Setting up the Review and Protocol Definition are the first stages of any SLR,
and its implementation in SciLRtool gives the users the very first impression of
5.1 Evaluation
58
our tool. As it appeared, many people find the interface of our tool non-friendly
- "this design looks outdated" - said one of our interviewees. Furthermore, some
people claimed they still prefer to undertake the Protocol Definition step via
google sheets or overleaf because they are accustomed to it.
5.1.2
Literature Searching
• Result AVG 0; Neutral
Most of our interviewees expect more functionality of our built-in Literature
Searching and mostly do not see a reason to use it. We found that some
people do not prefer Elsevier, our search engine, and instead use Google Scholar.
A professor complained about the small number of literature sources (only
Science@Direct and Scopus). Moreover, one master student said she wants to
see a journal rating in the search results table during interviews.
The feedback we received is fairly reasonable - the researchers who conduct SLR’s require multiple search engines and search sources in one place with
comprehensive metadata of every research work. We consider those features our
primary course of future work as the Literature Searching stage is the defining
reason few researchers might choose SciLRtool over other tools.
5.1.3
Duplicate Checking
• Result AVG +1; Useful
Duplicate Checking is a minor yet reasonably helpful feature. People agree on
its usefulness and consider it a required feature of any tool supporting SLR’s;
however, they are not impressed.
5.1 Evaluation
5.1.4
59
Study Selection
• Result AVG +2; Absolutely useful
Although SciLRtool did not contribute to Parsifal’s implementation of the
Study Selection stage, all interviewees agree on its usefulness and are satisfied with the result the stage produces - an XLS table with all studies, its
inclusion/exclusion criteria, author’s comments and other metadata. Nevertheless, some experts still made a couple of remarks and suggestions. The first is
to add filtering of articles by year, journal and journal rating since SciLRtool
possesses only sorting feature. Furthermore, the other suggestion is to display
the author’s comments on a particular article near its status (Accepted, Rejected or Duplicated) so that users can explicitly see an article’s status and the
reasoning behind it.
We strongly agree with the first suggestion and include it in our plan
for future work since some users might have a dozen hundreds of different
articles and might want to search or filter them using various methods. However,
we consider the second suggestion (adding comments near the status of an
article) to be a personal preference instead of a needed functionality since only
one person suggested it, and we think this will overcomplicate the interface of
SciLRtool.
5.1.5
Quality Assessment
• Result AVG +2; Absolutely useful
The new interface and features we developed for Quality Assessment satisfy
the needs of all our interviewees. Especially they find helpful the new Conflicts
5.1 Evaluation
60
system. We are satisfied with the obtained results and do not include the
Quality Assessment stage in the list for future work.
5.1.6
Data Extraction
• Result AVG -1; Useless
Since the process of data extraction is done by hands and is not automated.
However, extracting obtained articles in XLS format is unsuitable since most
users do their SLR’s in LATEX.
5.1.7
Data Analysis
• Result AVG +1; Useful
Most people spend much time creating qualitative analysis diagrams by hands
in third-party programs. SciLRtool auto-generates such diagrams and allows
users to download them. Most people consider the diagrams useful; however,
some interviewees suggested generating diagrams in TEX format according to
the LATEXTikZ package.
5.1.8
Generation of Documentation
• Result AVG +2; Absolutely useful
People are primarily positive about the auto-generation of a final report. Many
people especially noted the new PRISMA diagram flow. However, the DOCX
format of the final report is unsuitable for most users. Around half of the
interviewees suggest generating the final report in LATEX, concretely, Overleaf
5.2 Discussion
61
[30]. Overleaf is a popular Tex editor, and most of our colleagues at Innopolis
University utilise it.
We set our primary goal with the highest priority for future work to
integrate our system with Overleaf. The open problem remains to define an
Overleaf template to suit most users.
5.1.9
Publishing Evidence Synthesis
• Result AVG +2; Absolutely useful
Publishing the work done to be publically available is beneficial both for
authors and users. Users can find relevant SLR’s, while authors can receive
feedback from other users about their evidence synthesis. We obtained the
only suggestion to add searching and filtering by evidence synthesis title in
the "Browse" section of SciLRtool. We think this suggestion is correct since,
ordinarily, people want to get SLR’s in the concrete domain area and not the
entire list of existing evidence synthesis.
5.2
Discussion
We attempted to create a new product that includes best practices of
other tools and supports all stages of the Systematic Literature Review. We
definitely succeeded as the average result of evaluation for our system is +1
(useful); however, SciLRtool supports some stages but does not automate them.
We can observe it by the example of the Data Extraction stage - people consider
it useless because they need to conduct data extraction by hands. The same
applies to Setting up the review and Protocol Definition. Those stages can
5.2 Discussion
62
primarily help people who are new to SLR’s since they force researchers to
accompany every stage so that literature review becomes systematic.
Additionally, all stages in SciLRtool provide helpful information so that
users understand the purpose. However, we investigated that many proficient
researchers prefer writing, extracting, and analysing data directly in TEX editors, such as Overleaf. We consider that integrating SciLRtool with Overleaf
will engage more potential users and make our tool more competitive.
Besides, none of our interviewees has ever applied any tools supporting
SLR’s so we could not estimate the competitiveness of SciLRtool and what
elements of it would make people favour it over other tools. Therefore, we aim at
conducting such interviews after the proposed future work will be accomplished.
Chapter 6
Conclusion
Initially, we researched the domain of Systematic Literature Reviews in
Software Engineering and the tools supporting it as described in the Literature Review chapter 2. Then we discovered a niche in this domain - there are
no existing tools dedicated to Software Engineering that support all stages of
Systematic Literature Reviews; thus, we decided to contribute to it by creating
SciLRtool. Accordingly, we designed SciLRtool in a way that combines best
practices of Parsifal and CADIMA tools so that it supports every stage described in the Methodology chapter 3. Henceforth, we implemented SciLRtool
and explained our design decisions in the Implementation chapter 4.
Finally, we evaluated the developed system by interviewing our colleagues
at Innopolis University, demonstrating SciLRtool to them and asking questions
as described in the Evaluation and Discussion chapter 5. By the end of the
interviews, we learned that most people experienced in SLR’s acknowledge our
tool helpful in their practices, especially Study Selection, Quality Assessment
and Publishing Evidence Synthesis stages. Moreover, we collected feedback
from our interviewees and defined our future work requirements, essentially
64
integration with Overleaf. Furthermore, we will continue to work on other
stages that were evaluated less than "Absolutely useful".
Bibliography cited
[1] K. BA and S. Charters, “Guidelines for performing systematic literature
reviews in software engineering,” vol. 2, Jan. 2007.
[2] D. Salah, R. Paige, and P. Cairns, “A systematic literature review for
agile development processes and user centred design integration,” ACM
International Conference Proceeding Series, May 2014. doi: 10 . 1145 /
2601248.2601276.
[3] J. Thomas and J. Brunton, “Eppi-reviewer 4: Software for research synthesis,” Jan. 2010.
[4] C. Kohl, E. Mcintosh, S. Unger, N. Haddaway, S. Kecke, J. Schiemann,
and R. Wilhelm, “Online tools supporting the conduct and reporting of
systematic reviews and systematic maps: A case study on cadima and
review of existing tools,” Environmental Evidence, vol. 7, Feb. 2018. doi:
10.1186/s13750-018-0115-5.
[5] V. Freitas, Parsfial, https://parsif.al, [Online; accessed 28-January-2021],
2018.
[6] E. Akl, D. Altman, P. Aluko, L. Askie, D. Beaton, J. Berlin, B. Bhaumik, C. Bingham, M. Boers, A. Booth, I. Boutron, S. Brennan, M.
Briel, S. Briscoe, J. Busse, D. Caldwell, M. Cargo, A. Carrasco-Labra,
BIBLIOGRAPHY CITED
66
A. Chaimani, and C. Young, Cochrane Handbook for Systematic Reviews
of Interventions. Oct. 2019, isbn: 9781119536604.
[7] D. Budgen, S. Charters, M. Turner, P. Brereton, B. Kitchenham, and S.
Linkman, “Investigating the applicability of the evidence-based paradigm
to software engineering,” Proceedings - International Conference on Software Engineering, Apr. 2006. doi: 10.1145/1137661.1137665.
[8] D. Budgen, M. Turner, P. Brereton, and B. Kitchenham, “Using mapping
studies in software engineering,” Proceedings of PPIG 2008, vol. 2, Jan.
2008.
[9] G. Tsafnat, P. Glasziou, M. K. Choong, A. Dunn, F. Galgani, and E.
Coiera, “Systematic review automation technologies,” Systematic reviews,
vol. 3, p. 74, Jul. 2014. doi: 10.1186/2046-4053-3-74.
[10] G. Tsafnat, A. Dunn, P. Glasziou, and E. Coiera, “The automation of
systematic reviews,” BMJ (Clinical research ed.), vol. 346, f139, Jan. 2013.
doi: 10.1136/bmj.f139.
[11] M. Ouzzani, H. Hammady, Z. Fedorowicz, and A. Elmagarmid,
“Rayyan—a web and mobile app for systematic reviews,” Systematic Reviews, vol. 5, Dec. 2016. doi: 10.1186/s13643-016-0384-4.
[12] K. James, N. Randall, and N. Haddaway, “A methodology for systematic
mapping in environmental sciences,” Environmental Evidence, vol. 5, p. 7,
Apr. 2016. doi: 10.1186/s13750-016-0059-6.
[13] C. Counsell, “Formulating questions and locating primary studies for inclusion in systematic reviews,” Annals of Internal Medicine, vol. 127,
pp. 380–387, 1997.
BIBLIOGRAPHY CITED
67
[14] D. Sackett, W. Richardson, W. Rosenberg, and b. Haynes, “Evidencebased medicine. how to practice and teach ebm. evidence-based medicine,”
Churchill Livingston, vol. 2, Jan. 2005.
[15] P. Doshi, M. Jones, and T. Jefferson, “Rethinking credible evidence synthesis,” BMJ, vol. 344, 2012, issn: 0959-8138. doi: 10.1136/bmj.d7898.
eprint: https://www.bmj.com/content/344/bmj.d7898.full.pdf. [Online].
Available: https://www.bmj.com/content/344/bmj.d7898.
[16] Elsevier, https://www.elsevier.com, [Accessed: 2021-01-08].
[17] A. Lunev, Alternatives to elsevier? May 2020.
[18] B. Ballew, “Elsevier’s scopus® database,” Journal of Electronic Resources in Medical Libraries, vol. 6, pp. 245–252, Jul. 2009. doi: 10 .
1080/15424060903167252.
[19] Science@direct, http://www.sciencedirect.com/, [Accessed: 2021-01-09].
[20] A. Elmagarmid, P. Ipeirotis, and V. Verykios, “Duplicate record detection: A survey,” Knowledge and Data Engineering, IEEE Transactions
on, vol. 19, pp. 1–16, Feb. 2007. doi: 10.1109/TKDE.2007.250581.
[21] R. Aabenhus, J. U. Jensen, and J. Cals, “Incorrect inclusion of individual
studies and methodological flaws in systematic review and meta-analysis,”
The British journal of general practice : the journal of the Royal College
of General Practitioners, vol. 64, pp. 221–2, May 2014. doi: 10.3399/
bjgp14X679615.
[22] M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg,
“Adaptive name matching in information integration.,” Intelligent Systems, IEEE, vol. 18, pp. 16–23, Oct. 2003. doi: 10 . 1109 / MIS . 2003 .
1234765.
BIBLIOGRAPHY CITED
68
[23] Y. Zhou, H. Zhang, X. Huang, S. Yang, M. Ali Babar, and H. Tang,
“Quality assessment of systematic reviews in software engineering,” Apr.
2015, pp. 1–14. doi: 10.1145/2745802.2745815.
[24] S. Kiritchenko, B. de Bruijn, S. Carini, J. Martin, and I. Sim, “Exact:
Automatic extraction of clinical trial characteristics from journal publications,” BMC medical informatics and decision making, vol. 10, p. 56,
Sep. 2010. doi: 10.1186/1472-6947-10-56.
[25] I. J. Marshall and B. C. Wallace, “Toward systematic review automation:
A practical guide to using machine learning tools in research synthesis,”
Systematic Reviews, vol. 8, no. 1, p. 163, Jul. 2019, issn: 2046-4053. doi:
10.1186/s13643-019-1074-9. [Online]. Available: https://doi.org/10.1186/
s13643-019-1074-9.
[26] L. B. Shelby and J. J. Vaske, “Understanding meta-analysis: A review of
the methodological literature,” Leisure Sciences, vol. 30, no. 2, pp. 96–
110, 2008. doi: 10.1080/01490400701881366. eprint: https://doi.org/10.
1080/01490400701881366. [Online]. Available: https://doi.org/10.1080/
01490400701881366.
[27] D. Moher, A. Liberati, J. Tetzlaff, D. G. Altman, and T. P. Group, “Preferred reporting items for systematic reviews and meta-analyses: The
prisma statement,” PLOS Medicine, vol. 6, no. 7, pp. 1–6, Jul. 2009.
doi: 10.1371/journal.pmed.1000097. [Online]. Available: https://doi.org/
10.1371/journal.pmed.1000097.
[28] D. Moher, A. Liberati, J. Tetzlaff, and D. G. A. and, “Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement,” PLoS Medicine, vol. 6, no. 7, e1000097, Jul. 2009. doi: 10.1371/
BIBLIOGRAPHY CITED
69
journal . pmed . 1000097. [Online]. Available: https : / / doi . org / 10 . 1371 /
journal.pmed.1000097.
[29] R. Likert, “A technique for the measurement of attitudes,” English,
OCLC: 812060, Ph.D. dissertation, The Science Press, New York, 1932.
[30] J. Hammersley and J. Lees-Miller, 2012. [Online]. Available: https : / /
www.overleaf.com/.
Appendix A
Existing Steps For
Systematic Reviews
71
Figure A.1: Existing steps for systematic reviews (possible to have some
deviations) [9]
Appendix B
PRISMA documents
Figure B.1: PRISMA flow diagram template
73
Figure B.2: PRISMA report-assessment checklist
Appendix C
SciLRtool improvements
over Parsifal
1. Define a question type (PICOC, PIT, PO)
2. Define if a SLR or a SM will be performed
Setting Up
the Review
-
Literature Searching Links to searched articles
-
Study Selection
-
New SoftTF.IDF metric for strings comparison -
Duplicate Checking
Development
-
Manual entry
information format
-
Protocol
new features
stage
Table II: SciLRtool improvements over Parsifal shortly described for each stage with corresponding information
formats.
75
Data Extraction
Quality Assessment
-
assigned to them
6. Filtering articles in conflict by members
conflicts between team members
5. New Conflict System, which resolves
to them
4. Filtering articles by members assigned
of Quality Answers" approach
3. New "Each Quality Question has its set
manually or automatically (main author)
2. Changing member’s assigned articles
during Quality Assessment (main author)
1. Nominating team members to be involved
-
Manual entry
Table II continued from previous page
76
it is DOCX entry
2. Manual entry
(timeline, contribution, etc.)
2. Filling protocol documentation
4. Included in conducting report
DOCX file as PNG file
5. Downloading other Evidence Synthesis
in DOCX (reporting parts) or XLS
(data extraction sheet and
study selection outcome)
3. Downloading PRISMA’s self-assessment
checklist file for final review report assessment
4. Generating PRISMA’s flow-diagram
automatically
5. Publishing Evidence Synthesis for it
to be publicaly available
Documentation
(search strategy, data extraction strategy, etc.) 3. Downloading in DOC
1. Manual entry except acknowledgemnts,
3. Manual entry
GIF, DOCX, TXT
2. Uploading in PNG, JPG, JPEG, BMP,
1. Filling final review documentation
in Quantitative and Qualitative synthesis
3. Entering numbers of studies involved
Qualitative synthesis files
1. Downloading in PNG
Generation of
Qualitative Synthsis
Quantitative and
2. Uploading Quantitative and
1. Downloading Data Analysis Files
Table II continued from previous page
77
Отзывы:
Авторизуйтесь, чтобы оставить отзыв