Proceedings of the Future 
Technologies Conference 
(FTC) 2018 
Volume 1 
123
ISSN 2194-5357 ISSN 2194-5365 (electronic) 
Advances in Intelligent Systems and Computing 
ISBN 978-3-030-02685-1 ISBN 978-3-030-02686-8 (eBook) 
https://doi.org/10.1007/978-3-030-02686-8 
Library of Congress Control Number: 2018957983 
© Springer Nature Switzerland AG 2019 
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part 
of the material is concerned, speci?cally the rights of translation, reprinting, reuse of illustrations, 
recitation, broadcasting, reproduction on micro?lms or in any other physical way, and transmission 
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar 
methodology now known or hereafter developed. 
The use of general descriptive names, registered names, trademarks, service marks, etc. in this 
publication does not imply, even in the absence of a speci?c statement, that such names are exempt from 
the relevant protective laws and regulations and therefore free for general use. 
The publisher, the authors and the editors are safe to assume that the advice and information in this 
book are believed to be true and accurate at the date of publication. Neither the publisher nor the 
authors or the editors give a warranty, express or implied, with respect to the material contained herein or 
for any errors or omissions that may have been made. The publisher remains neutral with regard to 
jurisdictional claims in published maps and institutional af?liations. 
This Springer imprint is published by the registered company Springer Nature Switzerland AG 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface 
Future Technologies Conference (FTC) 2018 was held on November 13–14, 2018, 
in Vancouver at the Marriott Pinnacle Downtown Hotel, with sweeping views 
of the coastal mountains, Coal Harbour, and Vancouver’s city skyline. The city of 
Vancouver is considered as one of the most beautiful cities in the world. 
With great privilege, we present the Proceedings of FTC 2018 in two volumes to 
the readers. We hope that you will ?nd it useful, exciting, and inspiring. FTC 2018 
aims at producing a bright picture and charming landscape for future technologies 
by providing a platform to present the best of current systems’ research and 
practice, emphasizing innovation and quanti?ed experience. The ever-changing 
scope and rapid development of future technologies create new problems and 
questions, resulting in the real need for sharing brilliant ideas and stimulating good 
awareness of this important research ?eld. 
Researchers, academics, and technologists from leading universities, research 
?rms, government agencies, and companies from 50+ countries presented the latest 
research at the forefront of technology and computing. After the double-blind 
review process, we ?nally selected 173 full papers including six poster papers to 
publish. 
We would like to express our gratitude and appreciation to all of the reviewers 
who helped us maintain the high quality of manuscripts included in this conference 
proceedings. We would also like to extend our thanks to the members of the 
organizing team for their hard work. We are tremendously grateful for the contri-butions 
and support received from authors, participants, keynote speakers, program 
committee members, session chairs, organizing committee members, steering 
committee members, and others in their various roles. Their valuable support, 
suggestions, dedicated commitment, and hard work have made FTC 2018 a suc-cess. 
Finally, we would like to thank the conference’s sponsors and partners: 
Western Digital, IBM Research, and Nature Electronics. 
We believe this event will help further disseminate new ideas and inspire more 
international collaborations. 
v
We hope that all the participants of FTC 2018 had a wonderful and fruitful time 
at the conference and that our overseas guests enjoyed their sojourn in Vancouver! 
Kind Regards, 
Kohei Arai 
vi Editor’s Preface
Contents 
Towards in SSVEP-BCI Systems for Assistance in Decision-Making . . . 1 
Rodrigo Hübner, Linnyer Beatryz Ruiz Aylon, and Gilmar Barreto 
Image-Based Wheel-Base Measurement in Vehicles: A Sensitivity 
Analysis to Depth and Camera’s Intrinsic Parameters. . . . . . . . . . . . . . 19 
David Duron-Arellano, Daniel Soto-Lopez, and Mehran Mehrandezh 
Generic Paper and Plastic Recognition by Fusion of NIR 
and VIS Data and Redundancy-Aware Feature Ranking. . . . . . . . . . . . 30 
Alla Serebryanyk, Matthias Zisler, and Claudius Schnörr 
Hand Gesture Recognition with Leap Motion . . . . . . . . . . . . . . . . . . . . 46 
Lin Feng, Youchen Du, Shenglan Liu, Li Xu, Jie Wu, and Hong Qiao 
A Fast and Simple Sample-Based T-Shirt Image Search Engine . . . . . . 55 
Liliang Chan, Pai Peng, Xiangyu Liu, Xixi Cao, and Houwei Cao 
Autonomous Robot KUKA YouBot Navigation Based on Path 
Planning and Traf?c Signals Recognition. . . . . . . . . . . . . . . . . . . . . . . . 63 
Carlos Gordón, Patricio Encalada, Henry Lema, Diego León, 
and Cristian Peñaherrera 
Towards Reduced Latency in Saccade Landing Position Prediction 
Using Velocity Pro?le Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 
Henry Grif?th, Subir Biswas, and Oleg Komogortsev 
Wireless Power Transfer Solutions for ‘Things’ in the Internet 
of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 
Tim Helgesen and Moutaz Haddara 
Electronic Kintsugi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 
Vanessa Julia Carpenter, Amanda Willis, Nikolaj “Dzl” Møbius, 
and Dan Overholt 
vii
A Novel and Scalable Naming Strategy for IoT Scenarios . . . . . . . . . . . 122 
Alejandro Gómez-Cárdenas, Xavi Masip-Bruin, Eva Marín-Tordera, 
and Sarang Kahvazadeh 
The IoT and Unpacking the Heffalump’s Trunk . . . . . . . . . . . . . . . . . . 134 
Joseph Lindley, Paul Coulton, and Rachel Cooper 
Toys That Talk to Strangers: A Look at the Privacy Policies 
of Connected Toys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 
Wahida Chowdhury 
A Reinforcement Learning Multiagent Architecture Prototype 
for Smart Homes (IoT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 
Mario Rivas and Fernando Giorno 
Real-Time Air Pollution Monitoring Systems Using Wireless Sensor 
Networks Connected in a Cloud-Computing, Wrapped 
up Web Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 
Byron Guanochanga, Rolando Cachipuendo, Walter Fuertes, 
Santiago Salvador, Diego S. Benítez, Theo?los Toulkeridis, Jenny Torres, 
César Villacís, Freddy Tapia, and Fausto Meneses 
A Multi-agent Model for Security Awareness Driven by Home 
User’s Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 
Farhad Foroughi and Peter Luksch 
Light Weight Cryptography for Resource Constrained IoT Devices . . . 196 
Hessa Mohammed Zaher Al Shebli and Babak D. Beheshti 
A Framework for Ranking IoMT Solutions Based on Measuring 
Security and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 
Faisal Alsubaei, Abdullah Abuhussein, and Sajjan Shiva 
CUSTODY: An IoT Based Patient Surveillance Device . . . . . . . . . . . . . 225 
Md. Sadad Mahamud, Md. Manirul Islam, Md. Saniat Rahman, 
and Samiul Haque Suman 
Personal Branding and Digital Citizenry: Harnessing the Power 
of Data and IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 
Fawzi BenMessaoud, Thomas Sewell III, and Sarah Ryan 
Testing of Smart TV Applications: Key Ingredients, Challenges 
and Proposed Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 
Bestoun S. Ahmed and Miroslav Bures 
Dynamic Evolution of Simulated Autonomous Cars in the Open 
World Through Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 
Joe R. Sylnice and Germán H. Alférez 
viii Contents
Exploring the Quanti?ed Experience: Finding Spaces for People 
and Their Voices in Smarter, More Responsive Cities . . . . . . . . . . . . . . 269 
H. Patricia McKenna 
Prediction of Traf?c-Violation Using Data Mining Techniques . . . . . . . 283 
Md Amiruzzaman 
An Intelligent Traf?c Management System Based on the Wi-Fi 
and Bluetooth Sensing and Data Clustering. . . . . . . . . . . . . . . . . . . . . . 298 
Hamed H. Afshari, Shahrzad Jalali, Amir H. Ghods, and Bijan Raahemi 
Economic and Performance Based Approach to the Distribution 
System Expansion Planning Problem Under 
Smart Grid Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 
Hatem Zaki, R. A. Swief, T. S. Abdel-Salam, and M. A. M. Mostafa 
Connecting to Smart Cities: Analyzing Energy Times Series 
to Visualize Monthly Electricity Peak Load in Residential Buildings . . . 333 
Shamaila Iram, Terrence Fernando, and Richard Hill 
Anomaly Detection in Q & A Based Social Networks . . . . . . . . . . . . . . 343 
Neda Soltani, Elham Hormizi, and S. Alireza Hashemi Golpayegani 
A Study of Measurement of Audience in Social Networks . . . . . . . . . . . 359 
Mohammed Al-Maitah 
Predicting Disease Outbreaks Using Social Media: Finding 
Trustworthy Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 
Razieh Nokhbeh Zaeem, David Liau, and K. Suzanne Barber 
Detecting Comments Showing Risk for Suicide in YouTube . . . . . . . . . 385 
Jiahui Gao, Qijin Cheng, and Philip L. H. Yu 
Twitter Analytics for Disaster Relevance and Disaster 
Phase Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 
Abeer Abdel Khaleq and Ilkyeun Ra 
Incorporating Code-Switching and Borrowing in Dutch-English 
Automatic Language Detection on Twitter. . . . . . . . . . . . . . . . . . . . . . . 418 
Samantha Kent and Daniel Claeser 
A Systematic Review of Time Series Based Spam Identi?cation 
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 
Iqra Muhammad, Usman Qamar, and Rabia Noureen 
CNN with Limit Order Book Data for Stock Price Prediction . . . . . . . . 444 
Jaime Niño, German Hernandez, Andrés Arévalo, Diego Leon, 
and Javier Sandoval 
Contents ix
Implementing Clustering and Classi?cation Approaches for Big Data 
with MATLAB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 
Katrin Pitz and Reiner Anderl 
Visualization Tool for JADE Platform (JEX). . . . . . . . . . . . . . . . . . . . . 481 
Halim Djerroud and Arab Ali Cherif 
Decision Tree-Based Approach for Defect Detection and Classi?cation 
in Oil and Gas Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 
Abduljalil Mohamed, Mohamed Salah Hamdi, and So?ene Tahar 
Impact of Context on Keyword Identi?cation and Use in Biomedical 
Literature Mining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 
Venu G. Dasigi, Orlando Karam, and Sailaja Pydimarri 
A Cloud-Based Decision Support System Framework for Hydropower 
Biological Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 
Hongfei Hou, Zhiqun Daniel Deng, Jayson J. Martinez, Tao Fu, Jun Lu, 
Li Tan, John Miller, and David Bakken 
An Attempt to Forecast All Different Rainfall Series by Dynamic 
Programming Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 
Swe Swe Aung, Shin Ohsawa, Itaru Nagayama, and Shiro Tamaki 
Non-subsampled Complex Wavelet Transform Based Medical 
Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 
Sanjay N. Talbar, Satishkumar S. Chavan, and Abhijit Pawar 
Predicting Concussion Symptoms Using Computer Simulations. . . . . . . 557 
Milan Toma 
Integrating Markov Model, Bivariate Gaussian Distribution 
and GPU Based Parallelization for Accurate Real-Time Diagnosis 
of Arrhythmia Subclasses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 
Purva R. Gawde, Arvind K. Bansal, and Jeffery A. Nielson 
Identi?cation of Glioma from MR Images Using Convolutional 
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 
Nidhi Saxena, Rochan Sharma, Karishma Joshi, and Hukum Singh Rana 
Array of Things for Smart Health Solutions Injury Prevention, 
Performance Enhancement and Rehabilitation. . . . . . . . . . . . . . . . . . . . 598 
S. M. N. Arosha Senanayake, Siti Asmah @ Khairiyah Binti Haji Raub, 
Abdul Ghani Naim, and David Chieng 
Applying Waterjet Technology in Surgical Procedures . . . . . . . . . . . . . 616 
George Abdou and Nadi Atalla 
Blockchain Revolution in the Healthcare Industry. . . . . . . . . . . . . . . . . 626 
Sergey Avdoshin and Elena Pesotskaya 
x Contents
Effective Reversible Data Hiding in Electrocardiogram Based 
on Fast Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 
Ching-Yu Yang, Lian-Ta Cheng, and Wen-Fong Wang 
Semantic-Based Resume Screening System. . . . . . . . . . . . . . . . . . . . . . . 649 
Yu Hou and Lixin Tao 
The Next Generation of Arti?cial Intelligence: Synthesizable AI . . . . . . 659 
Supratik Mukhopadhyay, S. S. Iyengar, Asad M. Madni, 
and Robert Di Biano 
Cognitive Natural Language Search Using Calibrated 
Quantum Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 
Rucha Kulkarni, Harshad Kulkarni, Kalpesh Balar, and Praful Krishna 
Taxonomy and Resource Modeling in Combined 
Fog-to-Cloud Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687 
Souvik Sengupta, Jordi Garcia, and Xavi Masip-Bruin 
Predicting Head-to-Head Games with a Similarity Metric 
and Genetic Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 
Arisoa S. Randrianasolo and Larry D. Pyeatt 
Arti?cial Human Swarms Outperform Vegas Betting Markets . . . . . . . 721 
Louis Rosenberg and Gregg Willcox 
Genetic Algorithm Based on Enhanced Selection and Log-Scaled 
Mutation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 
Neeraj Gupta, Nilesh Patel, Bhupendra Nath Tiwari, and Mahdi Khosravy 
Second-Generation Web Interface to Correcting ASR Output . . . . . . . . 749 
Oldrich Kruza and Vladislav Kubon 
A Collaborative Multi-agent System for Oil Palm Pests 
and Diseases Global Situation Awareness. . . . . . . . . . . . . . . . . . . . . . . . 763 
Salama A. Mostafa, Ahmed Abdulbasit Hazeem, 
Shihab Hamad Khaleefahand, Aida Mustapha, and Rozanawati Darman 
Using Mouse Dynamics for Continuous User Authentication . . . . . . . . . 776 
Osama A. Salman and Sarab M. Hameed 
Ten Guidelines for Intelligent Systems Futures . . . . . . . . . . . . . . . . . . . 788 
Daria Loi 
Towards Computing Technologies on Machine Parsing of English 
and Chinese Garden Path Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 
Jiali Du, Pingfang Yu, and Chengqing Zong 
Music Recommender According to the User Current Mood. . . . . . . . . . 828 
Murtadha Al-Maliki 
Contents xi
Development of Extreme Learning Machine Radial Basis Function 
Neural Network Models to Predict Residual Aluminum for Water 
Treatment Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 
C. D. Jayaweera and N. Aziz 
Multi-layer Mangrove Species Identi?cation . . . . . . . . . . . . . . . . . . . . . 849 
Fenddy Kong Mohd Aliff Kong, Mohd Azam Osman, 
Wan Mohd Nazmee Wan Zainon, and Abdullah Zawawi Talib 
Intelligent Seating System with Haptic Feedback 
for Active Health Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856 
Peter Gust, Sebastian P. Kampa, Nico Feller, Max Vom Stein, 
Ines Haase, and Valerio Virzi 
Intelligence in Embedded Systems: Overview and Applications . . . . . . . 874 
Paul D. Rosero-Montalvo, Vivian F. López Batista, Edwin A. Rosero, 
Edgar D. Jaramillo, Jorge A. Caraguay, José Pijal-Rojas, 
and D. H. Peluffo-Ordóñez 
Biometric System Based on Kinect Skeletal, Facial and 
Vocal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884 
Yaron Lavi, Dror Birnbaum, Or Shabaty, and Gaddi Blumrosen 
Towards the Blockchain-Enabled Offshore Wind 
Energy Supply Chain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 
Samira Keivanpour, Amar Ramudhin, and Daoud Ait Kadi 
Optimal Dimensionality Reduced Quantum Walk 
and Noise Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914 
Chen-Fu Chiang 
Implementing Dual Marching Square Using Visualization 
Tool Kit (VTK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 930 
Manu Garg and Sudhanshu Kumar Semwal 
Procedural 3D Tile Generation for Level Design . . . . . . . . . . . . . . . . . . 941 
Anthony Medendorp and Sudhanshu Kumar Semwal 
Some Barriers Regarding the Sustainability of Digital Technology 
for Long-Term Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 
Stefan Svetsky and Oliver Moravcik 
Digital Collaboration with a Whiteboard in Virtual Reality. . . . . . . . . . 962 
Markus Petrykowski, Philipp Berger, Patrick Hennig, 
and Christoph Meinel 
Teaching Practices with Mobile in Different Contexts . . . . . . . . . . . . . . 982 
Anna Helena Silveira Sonego, Leticia Rocha Machado, 
Cristina Alba Wildt Torrezzan, and Patricia Alejandra Behar 
xii Contents
Accessibility and New Technology MOOC- Disability and Active 
Aging: Technological Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992 
Samuel A. Navarro Ortega and M. Pilar Munuera Gómez 
Lecturing to Your Students: Is Their Heart In It?. . . . . . . . . . . . . . . . . 1005 
Aidan McGowan, Philip Hanna, Des Greer, and John Busch 
Development of Collaborative Virtual Learning Environments 
for Enhancing Deaf People’s Learning in Jordan. . . . . . . . . . . . . . . . . . 1017 
Ahmad A. Al-Jarrah 
Game Framework to Improve English Language Learners’ 
Motivation and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029 
Monther M. Elaish, Norjihan Abdul Ghani, Liyana Shuib, 
and Abdulmonem I. Shennat 
Insights into Design of Educational Games: Comparative Analysis 
of Design Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041 
Rabail Tahir and Alf Inge Wang 
Immersive and Collaborative Classroom Experiences 
in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062 
Derek Jacoby, Rachel Ralph, Nicholas Preston, and Yvonne Coady 
The Internet of Toys, Connectedness and Character-Based Play 
in Early Education. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079 
Pirita Ihamäki and Katriina Heljakka 
Learning Analytics Research: Using Meta-Review to Inform 
Meta-Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097 
Xu Du, Juan Yang, Mingyan Zhang, Jui-Long Hung, and Brett E. Shelton 
Students’ Evidential Increase in Learning Using Gami?ed 
Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109 
V. Z. Vanduhe, H. F. Hassan, Dokun Oluwajana, M. Nat, A. Idowu, 
J. J. Agbo, and L. Okunlola 
Improving the Use of Virtual Worlds in Education Through Learning 
Analytics: A State of Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123 
Fredy Gavilanes-Sagnay, Edison Loza-Aguirre, Diego Riofrío-Luzcando, 
and Marco Segura-Morales 
Design and Evaluation of an Online Digital Storytelling Course 
for Seniors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133 
David Kaufman, Diogo Silva, Robyn Schell, and Simone Hausknecht 
The Role of Self-ef?cacy in Technology Acceptance . . . . . . . . . . . . . . . . 1142 
Saleh Alharbi and Steve Drew 
Contents xiii
An Affective Sensitive Tutoring System for Improving Student’s 
Engagement in CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151 
Ruth Agada, Jie Yan, and Weifeng Xu 
Multimedia Interactive Boards as a Teaching and Learning Tool 
in Environmental Education: A Case-Study with 
Portuguese Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164 
Cecília M. Antão 
Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171 
xiv Contents
Towards in SSVEP-BCI Systems 
for Assistance in Decision-Making 
Rodrigo Hubner ¨ 
1,3(B) , Linnyer Beatryz Ruiz Aylon2 , and Gilmar Barreto3 
1 
Computer Department, Computer Interfaces Research Group, 
Federal University of Technology - Paran´a, Campo Mour˜ao, Paran´a 87301–899, Brazil 
rodrigohubner@utfpr.edu.br 
2 
Manna Research Group, State University of Maring´a, 
Maring´a, Paran´a 87020–900, Brazil 
3 
School of Electrical and Computer Engineering, Intelligent Systems and Control 
Laboratory, State University of Campinas, Campinas, S˜ao Paulo 13083–970, Brazil 
Abstract. In recent years, Brain Computer-Interfaces (BCI) has a 
major focus on systems out of clinical scope. These systems have been 
used to control electrical and electronic equipment, control of digital 
games and other kinds of “control”. Such control can be accomplished 
through decision-making by a BCI system. A paradigm known for this 
purpose is SSVEP (system based on steady-state visually evoked poten-tial 
paradigm), in which it is possible to distinguish targets with dif-ferent 
frequency ?icker through visual evocations. This paper proposes 
a human-computer interaction system using SSVEP for assistance in 
decision-making. In particular, the work describes a prototype of tra?c 
lights proposed as a case study. The experiments with this prototype, 
have created decision-making situations, allowing the SSVEP-BCI sys-tem 
assists the individual to decide correctly. 
Keywords: 
BCI· SSVEP·
Decision-making 
1 Introduction 
Brain Computer-Interfaces (BCI) [3,7,19] is commonly used for the development 
of systems that can improve the quality of life of people who have some physical 
constraint which limits the capacity of that person (visual, auditory or motor). 
In this way, a BCI system should minimize the subject’s disability by assisting 
in the task that the subject could perform alone. An example of this is the [10], 
a system in which a subject who has speech impairment, focuses on an array of 
letters on a monitor, and through the visual stimuli generated, the BCI system 
can classify which the letter the subject is looking at and displaying it. 
A BCI system can also aid in the decision-making of healthy subjects. There 
are situations that can be considered risky, for example, braking a vehicle while 
driving when you see a red tra?c light or a car headlight ?ashing ahead. In such 
situations, a BCI system can assist the driver if the decision taken by him is 
s 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1–18, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_1
2 R. Hubner ¨ et al. 
not the correct one. With this premise, we are developing a work to investigate 
the SSVEP paradigm (Steady-State Visually Evoked Potential) [13–15] used to 
determine which target with ?icker frequency an subject is focused, which can be 
recognized with an electroencephalography (EEG) equipment. In order for the 
BCI system to make the right decision, it is necessary that the di?erent events 
are being presented at di?erent ?icker frequencies. 
In order to conduct this research, was built simulations that reproduce tech-niques 
that use SSVEP, because when this concept of decision-making is applied 
to the real world, such situations can not be played the same way using the 
traditional SSVEP paradigm as bright targets do not present a scintillation fre-quency 
that can be classi?ed by the BCI system, in addition to endangering the 
life of the experiment subjects. In this context, the objective of this paper is to 
present an empirical study of the techniques used for the processing of SSVEP 
signals, aiming the development of a SSVEP-BCI system to aid in decision mak-ing 
in situations close to the real world. The reason is that real bright targets 
do not have a ?icker frequency that can be classi?ed by the BCI system, beside 
putting at risk the lives of the subjects of the experiment. In this context, the 
objective of this paper is to present an empirical study of the techniques used 
for the processing of SSVEP signals, aiming the development of a SSVEP-BCI 
system to assist in decision-making in situations close to the real world. For this, 
we have built a prototype of tra?c lights with Light Emitting Diode (LED) to 
create decision-making situations. 
To ful?ll this objective, a set of experiments based on the SSVEP paradigm 
was reproduced using a public database, with the intention of evaluating the pro-gramming 
methods. We also constructed databases with EEG signal acquisition 
to be evaluated with a prototype using LED-based tra?c lights, in which they 
generate the necessary visual evocation for experimentation. Finally, we investi-gated 
di?erent SSVEP signal stimulation strategies, making the prototype tra?c 
lights constructed have a behavior closer to reality without the visualization of 
traditional ?icker frequencies of the SSVEP paradigm. 
This paper is divided as follows. Section 2 presents a brief grounding for 
the SSVEP paradigm. Section 3 presents some related works. Section 4 presents 
experiments with public database and with the constructed prototype, using the 
traditional model SSVEP. Section 5 presents BCI system directions for evaluat-ing 
decision-making at tra?c lights, using the SSVEP paradigm in non-?ickering 
targets. Finally, Sect. 6 presents the conclusion. 
2 SSVEP-BCI Background 
The BCI paradigms determine what and how the subject must behave to produce 
certain known patterns that can be interpreted by a BCI system. The subject 
must generally be subjected to a calibration equipment and a training before 
the experiment. The con?guration of the physical environment, positioning of 
the electrodes and the software set are directly associated with the paradigm 
used. The paradigms currently used in a BCI system are: Selective attention 
and Motor Imagery [18]. In this paper we focus on Selective Attention.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 3 
Selective Attention. Paradigms of BCI based on selective attention require 
external stimuli that result in patterns of response by the brain [8]. Such stimuli 
may be visual, auditory or tactile. In this method, each stimulus is associated 
with a speci?c command and the user must focus his attention on a target 
stimulus to generate the corresponding action. In this work will be used visual 
stimuli and the main paradigms that use these stimuli are: Steady-State Evoked 
Potentials (SSEP) and P300. 
– P300: The P300 paradigm consists of obtaining a series of positive peaks 
in the input signal, with a variation in amplitude in a short space of time. 
This variation should occur after the appearance of the infrequent target 
stimulus among several frequent [6]. In this way it is possible to visualize a 
variation in signal amplitude in the time domain. Stimuli can be auditory, 
visual or sensory. An example of a visual stimulus may be determined by a 
letter or screen symbol of a computer that the subject is focused on, which 
upon receiving a contrast (generally lighter) will generate a peak in the signal 
approximately 300 milliseconds after the stimulus evocation. For this peak is 
given the name of P300 (peak 300). 
– SSEP: Periodic external stimuli can be veri?ed in the signal obtained from 
any region of the visual cortex. They may be of the sensory, auditory, but 
mainly visual, known in the literature as SSVEP. 
– SSVEP: SSVEP stimuli can be triggered by a visual frequency stimulated 
to the subject. Usually these stimuli are generated by a computer simulation 
on the monitor screen, but it is also normal to use LEDs for it [25]. Using 
the screen of a monitor, it is necessary set up the experiment so the screen 
refresh rate as a multiple of the ?icker frequencies used as target. A target 
may be a light ?ickering at a frequency of 8 Hz, where an subject is visu-ally 
focused on it and thereby it will be possible to recognize a response in 
the electroencephalogram (EEG) signal obtained from the visual cortex at a 
frequency around 8 Hz. In a study conducted by [20] it has been found that 
stimulated frequencies can range from 5 to 100 Hz. The SSVEP signal has 
other characteristics such as luminance, contrast and chromatic that can be 
modulated together with the ?ickering frequencies of a target stimulus [4]. 
2.1 Signal Processing in the SSVEP Paradigm 
A BCI experiment based on the SSVEP paradigm is related to how the stimuli 
are presented to the subject and how the signals obtained through the EEG 
equipment are processed. We present the processing steps of the SSVEP signal. 
Pre-processing of EEG Signals. In pre-processing, an EEG signal is ?ltered 
without losing relevant information. In addition, the signal can be improved by 
separating the noises present, known as signal-to-noise ratio (SNR). When the 
SNR is low on the signal, it means that detectable patterns will be di?cult to 
?nd. Even when the SNR is high on the signal, it means that the standards will
4 R. Hubner ¨ et al. 
be easy to identify. Signal ?ltering techniques can be applied in combination, 
facilitating the determination of the signals of interest. 
Temporal and spatial ?lters are used as signal preprocessing. In this paper we 
used the bandpass temporal ?ltering techniques by the ?nite impulse response 
(FIR) method [22] and the spatial ?ltering method Common Average Reference 
(CAR) [17], which consists of the point-to-point subtraction of each signal by 
the mean of all EEG signals obtained by all the electrodes. 
Feature Extraction. This step performs a search for the features that best 
describes the expected properties of the input signal. Such characteristics can be 
obtained using: the signal waveform analyzed in the time domain; Components 
of subject frequencies in the frequency domain; Power density spectrum; Time 
frequency analysis (i.e. Short-Time Fourier Transform - STFT), Autoregressive 
Models, etc. [11]. 
In SSVEP-BCI systems, methods for extracting features based on the spec-tral 
information presented in the EEG signal. In a given set of evoked frequencies, 
the Power Spectral Density (PSD) calculation can extract from the signal, the 
information of interest to be classi?ed. The main methods used for SSVEP fre-quency 
density analysis are: Filter Bank, Spectrogram, Weltch Method [2] and 
Multitaper Method [16]. In this work was used the Multitaper Method that can 
be applied by the tool MNE-Python1 . 
Feature Selection. In the feature extraction can be obtained a large number 
of variables that will be analyzed in the future by a classi?er. In this step, the 
most relevant features of the set obtained by the feature extraction are selected, 
allowing to improve the performance of the classi?er in terms of faster execu-tion 
and e?ectiveness. Among the techniques of feature selection are mentioned 
Filter (Pearson’s Correlation Coe?cients and Davies-Bouldin Index) and the 
Wrappers technique [2]. The technique Recursive Feature Elimination (RFE) 
based in Wrappers is used in this work because it presents in general a better 
performance in the same work cited. 
Classi?cation. Classi?cation is the ?nal stage of EEG signal processing. It is 
possible to decide which action or command should be executed. The selection 
of characteristics has as output a vector of characteristics used by the classi?- 
cation of data in di?erent classes. Classi?ers that follow the supervised learning 
approach use samples of labeled examples called training sets. This set is formed 
by several labeled samples of each class, so that the classi?er is able to recognize 
new samples and classify them in any of the classes that make up this set. 
There are several supervised classi?cation algorithms, such as Support Vector 
Machine (SVM) and Linear Discriminant Analysis (LDA). In this work we chose 
to use the SVM classi?er, based on its performance presented in [15]. 
1 
http://martinos.org/mne.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 5 
3 Related Works 
The main works that contributed to the development of this paper are presented 
below. 
In Development of an ssvep-based BCI spelling system adopting 
a qwerty-style LED keyboard [12] a speller system was developed in the 
QWERTY model using 30 LEDs representing each key of keyboard, ?ickering at 
di?erent frequencies. This method allows the individual to select a character 
without the need for multiple steps as in traditional BCI speller systems. It was 
possible to obtain wide frequency resolution, strictly recognizing for example a 
?ickering stimulus of 0.1 Hz. The experiments were performed with ten healthy 
subjects, in which ?ve participated in an o?ine experiment and ?ve in an online 
experiment. 68 English words were used for the evaluations. In the o?ine results, 
accuracy of 76.67% and 72.33% was obtained for viewing angles 40 and 30 degrees 
respectively. The online results were better because the best angle and the best 
combination of electrodes were used (Oz and O2 in system 10–20), obtaining 
accuracy regarding the amount of time participants took to recognize each char-acter: 
5 s (84.69%), 6 s (86.17%) and 7 s (89.53%). From this work it was possible 
to obtain important information about the distance and positioning angle of the 
LEDs for a better result, besides the best electrodepositions for it. 
In A novel stimulation method for multi-class SSVEP-BCI using 
intermodulation frequencies [4] a method was developed using di?erent inter-modulation 
frequencies for SSVEP-BCIs with ?ickering targets at the same fre-quency 
of 15 Hz. The set up allowing a greater number of targets. The authors 
encoded nine target objects on an LCD screen, in which quadratic forms were 
arranged in a 3 × 3 matrix. The modulation frequency for each target was gener-ated 
by color characteristics (C), alternating the frames in green, red and gray, 
luminance characteristic (L), alternating frames with a di?erence of 20 cd/m- 2 
and the mixture of the two (CL) forming three approaches. As a result, the 
average accuracy for the online assessment of the three approaches was 85%, 
with the mixture of the two (CL) being the highest obtained of 96.41%. This 
work presents alternatives in the SSVEP paradigm, which it implies to recognize 
di?erent targets ?ickering in the same frequency. 
In the work Towards an optimization of stimulus parameters for 
brain-computer interfaces based on steady state visual evoked poten-tials 
[5] the in?uence of several characteristics of the SSVEP visual stimulus of 
the SSVEP signal is presented. Five characteristics were evaluated for the tar-gets: 
size, distance, color, shape and presence of a ?xation point in the middle 
of each ?ickering object. The distance between the stimulation targets and the 
presence or absence of the ?xation point had no signi?cant e?ect on the results, 
since the color and size of the ?ickering target played an important role in the 
SSVEP response. Experiments were performed with 5 subjects and four stim-uli 
were presented on the monitor screen with di?erent ?ickering frequencies. A 
group of LEDs was added adjacent to each object shown on the screen, respon-sible 
for randomly generating the imposed luminance. The spectral responses 
are larger for white, followed by yellow, red, green, and blue color. About the
6 R. Hubner ¨ et al. 
size of objects, the quality of spectral information is proportionately larger in 
relation to the size of the object. Other features did not have relevant e?ects for 
this study. This work presented important information for the characterization 
of the environment in which the prototype of our work is inserted. 
The work of Use of high-frequency visual stimuli above the critical 
?icker frequency in a SSVEP-based BMI [21] presents an evaluation using 
frequencies above those traditionally used in SSVEP-BCI systems. Green (low 
luminance) and blue (high luminance) LEDs were used to verify the accuracy of 
the system and the level of visual fatigue of the subjects. Subjects ?xed green and 
blue ?ickering light (30 and 70 Hz respectively), and the SSVEP amplitude was 
evaluated. The subjects were asked to indicate whether the stimulus was visibly 
?ickering and to report their subjective level of discomfort. It also evaluated 
visible frequencies (41, 43 and 45 Hz) against invisible frequencies (61, 63 and 
65 Hz). As a result, 93.1% and 88% were obtained for the visible and invisible 
stimuli respectively. In addition, it was concluded that high frequencies continue 
to o?er good performance and that visual fatigue has been reduced. In our paper 
we investigated the use of high ?ickering frequencies (invisible to the human eye) 
to approach a real situation. 
The related work presented encouraged the use of new concepts in the non-traditional 
SSVEP method. These methods can contribute to a SSVEP-BCI 
system applied in a real situation. The next section presents the conduction of 
the preliminary experiments. 
4 Preliminary Experiments 
This section presents two experimental sets that are the basis for our investiga-tion. 
The two sets are divided as follows: 
1. Development of codes for the evaluation of a public SSVEP-BCI database; 
and 
2. Construction of a prototype using tra?c lights with LEDs as ?ickering targets. 
Initially, we demonstrate the results of codes produced as part of this work, 
to evaluate a public database. After the evaluation of the experiment, a second 
experimental set was performed to evaluate a database produced by us, using 
a prototype with tra?c lights constructed with LEDs, in which LEDs perform 
traditional SSVEP stimuli, based on ?ickering targets frequencies. By analyzing 
these results in addition to investigating new methods linked to SSVEP-BCI 
systems, it will be possible to develop a new BCI system for decision-making 
with non-?ickering targets using the same physical components of the second 
experimental set. The proposal resulting from this research is in Sect. 5. 
In all experiments was used the tool MNE-Python [9], which makes up a set 
of libraries written in the Python programming language for the purpose of 
analyzing EEG and MEG data. The library also used was Scikit Learn2 for 
routines based on Computational Intelligence, also written in Python. 
2 
http://scikit-learn.org.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 7 
4.1 Public Database SSVEP-BCI 
In this section the experiment performed with the AVI SSVEP database3 , devel-oped 
by [24], built as part of a work by the same author [23], was develop a 
“speller with dictionary support”. First it will be introduced to database built 
by [24] and then presented the algorithmic strategies developed by us, detailing 
the loading and preparation of data, procedures and results respectively. 
Description of the Public Database AVI SSVEP. The base has measured 
EEG data from healthy subjects, being exposed in ?ickering targets to obtain 
SSVEP responses. Data were recorded using three electrodes (Oz, Fpz e Pz) posi-tioned 
according to the 10–20 system. The data obtained from the electrode Oz 
is the only ones recorded in the database. The electrode Fpz was used as refer-ence 
and the electrode PZ for ground. An LCD monitor was used for stimulus 
generation BenQ XL2420T with refresh rate at 120 Hz. The EEG equipment used 
was the g.USBamp which has a sampling rate of 512 Hz and gold-plated elec-trodes 
moistened with electrolytic gel. Subjects had to concentrate during the 
experiment on targets of 2.89 cm2 on the monitor screen, seated at a distance of 
60 cm from it. 
Two types of experiments were performed to compose this database. The 
?rst was performed with a single target (ST) to verify the existence of the VEP 
signal. Four subjects were used, each submitted to a single session, focusing on 
a single target for thirty seconds, four times. The frequencies chosen in each test 
were random, but they were the same for each subject. The second experiment 
was performed with multiple targets (MT), adding seven targets at di?erent 
frequencies. Five subjects were used in two sessions, focusing on multiple targets 
for sixteen seconds, ten times. In each trial the subject focused on one of the 
?ickering targets reported and the sequence reported was also random but the 
same for the ?ve subjects. 
Loading and Data Preparation. The codes developed for ST analysis were 
necessary because it has a single target, taking into account our main research 
at tra?c lights, only one light will be lit at a time. The MT data were also 
analyzed because there is a greater variation of samples and thus it is possible 
to construct and evaluate a greater combination of strategies. 
In the ST data, each subject performed only one session with four trials, but 
since there are twenty-seven trials in each session, the training and test data 
could be divided into di?erent proportions in the same session, so that 33% of 
the samples (9 samples) were used for the training, while 67% of the samples 
(18 samples) were used for testing. In the MT data, the training and test data 
of the classi?er are divided into di?erent sessions, because there are few samples 
available, adding ten tests each session, but each subject performed two sessions. 
In this way, the second session of each subject was used with ten samples for the 
training of the classi?er and the ?rst session with the same subject for the tests. 
3 
http://www.setzner.com/avi-ssvep-dataset/.
8 R. Hubner ¨ et al. 
Experimental Procedures. Regardless of the division of data for each exper-iment, 
the algorithms for preprocessing, feature extraction and selection and 
classi?cation were the same. Figure 1 shows the execution ?ow and the algo-rithms 
applied in each experimental stage. 
Fig. 1. General ?ow of execution of the experiments presenting the algorithms used in 
each step. 
Generally, the classi?cation algorithm uses di?erent combinations of features 
extracted for data training. In this experiment, the only features extracted is 
the Power Spectral Density (PSD) of the SSVEP signal, which allows to train 
the classi?cation model independent of its class. This occurs because regardless 
of the frequency stimulated, the PSD should have a higher value than the rest 
of the non-invoked frequencies. Thus, training models of any frequency can be 
applied to classify any test sample. 
Results. In the analysis of the results with the ST data, three combinations of 
data were used for the training and test, since each subject performed the same 
experimental sequence three times. Thus, the ?rst training section was used for 
the classi?cation model and the second and third for testing, and the other two 
possible combinations to testing three di?erent possibilities. 
The best frequency range for the feature extraction was to use a standard 
deviation equal to 0.3 (based on an exhaustive execution), that is, if the feature 
extraction was performed around a frequency of 6 Hz, the range frequency was 
from 5.7 to 6.3 Hz. 
Figure 2a presents the bar plot with the results of the experiment with the 
ST data. The best result was with subject 4, which the accuracy for the three 
sessions was 100%. But the worst result was with subject 3 using the ?rst session 
as a test, which an accuracy of 14% was obtained. The overall mean accuracy of 
all subjects was 70.75%.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 9 
Fig. 2. Results of the experiment with the ST data from the AVI database. 
The PSD charts were analyzed to determine the low results presented by 
subject 3. In the ?rst session, the target evoked a signal of 6.0 Hz, but the 
PSD is higher around 12.0 Hz. This result implies both the poor training of the 
classi?er and the use of these data for testing, resulting in low accuracy. 
Figure 2b presents a PSD of the ?rst session performed by subject 4, in 
which it obtained the highest accuracy (100%). It can be observed that in both 
?gures, the PSD is the highest around the evoked frequency and the rest of the 
frequencies have low values. These data have good classi?er training and also 
result in good accuracy if used for the test. 
In the results with the MT experiment, it was considered that the second 
session of each subject would be better used for classi?er training. The best 
frequency range for the feature extraction was also with the standard deviation 
equal to 0.3. 
Figure 3a shows the bar plot with the results of the MT experiment. Most of 
the results were better using the second session with the exception of subject 2. 
The best result was with subject 4 and 5, in which the accuracy was 100% for 
the two cases using the training with the second session. The worst result was 
with subject 3 using both the ?rst session and training as the second one, in 
which an accuracy of 50% and 60% respectively was obtained. The overall mean 
accuracy of all subjects was 84%. 
The PSD graphics were analyzed to determine the low results presented by 
subject 3. Figure 3b presents the PSD of the ?rst session performed by this 
subject. A signal of 9.3 Hz was evoked, but the PSD is larger around 6.5 Hz. 
The tests performed with the experimental base of [24] demonstrated that it 
is possible to use the codes developed by our work to evaluate an SSVEP-BCI 
system.
10 R. Hubner ¨ et al. 
Fig. 3. Results of the experiment with the MT data from the AVI database. 
Fig. 4. Tra?c lights built with LEDs used in experiment 2 prototype. 
4.2 SSVEP-BCI System Based on Flickering Tra?c Lights 
In this experimental stage, the construction of our database for the evaluation 
of the prototype using tra?c lights with ?ickering LEDs was started, as well as 
testing the functioning of the EEG equipment used. 
Description of Equipment Used. For the development of the prototype, 
two tra?c lights made up of LEDs were used. Figure 4a shows the tra?c light 
constructed with the rest of the prototype, built with three di?use of 10 mm, 
with red, yellow and green LED color. Figure 4b presents the tra?c light built 
with three high-brightness 5 mm LEDs and a high brightness LED of 3 mm, two 
in red, one yellow and one green (3 mm).
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 11 
The variability of tra?c lights were constructed to verify the di?erence of 
the EEG signal presented when using di?use or high brightness LEDs, since the 
latter has a higher light intensity despite causing a visual nuisance. 
The operation of the tra?c lights is carried out with the aid of the Arduino 
UNO4 , a free hardware electronic prototyping platform, which uses a microcon-troller 
ATmega328P of 32 MB ?ash memory and 16 MHz speed. In addition to the 
LEDs connected to the tra?c lights, a push button was also added to manually 
control the start of each session or to stop it if necessary. 
The EEG equipment used in the experiments is the OpenBCI board5 
of 32 bits with 8 channels for the EEG/MEG/ECG (Electroencephalo-gram/
Magnetoencephalograph/Electrocardiogram) measuring plus three auxil-iary 
channels used for the measuring of a gyroscopic sensor. The equipment can 
still be expanded to 16 channels using the module Daisy that accompanies the 
equipment. 
A helmet developed with a 3D printer was used to perform the experiment. 
Ultracortex Mark 36 , used to couple the electrodes and the OpenBCI board. 
The electrodes used for the experimentation are constructed with a Silver-Silver 
Chloride (Ag-AgCl) alloy, dispensing with the use of electrolytic paste or gel, thus 
allowing easy placement of the helmet on di?erent subjects during an experiment. 
Experimental Procedures. To simulate the tra?c light with the LEDs with 
?ickering frequencies, a code was developed for the micro controller that allows 
to specify the frequencies of each LED. In the case of a conventional SSVEP-BCI 
experiment, it is desirable for multiple targets to ?ick at di?erent frequencies, so 
Eq. 1 was applied in the Arduino code, where the interval I is the time between 
the LED activations by frequency division f desired by a unit, adding the division 
by 2 to disregard the half cycle of the LED on/o?, multiplying by 1000 to 
calculate the time in milliseconds, and ?nally subtracting n which is the delay 
of loop of code running on hardware. This delay was calculated using an LDR 
light sensor connected to an Arduino in which the sensor was pointed at the 
LED lit at di?erent frequencies and the read sensor data sent to the computer 
for analysis by a graph as a function of time. It has been found that this delay 
varies from 1 to 2 ms, so the average of this value (1.5 ms) has been assigned 
to . 
I = [1 / f] / 2 
*] 
1000 
-] 
] (1) 
The following frequencies for each LED have been con?gured: red = 8 Hz, 
yellow = 10 Hz e green = 12 Hz. Non-multiple frequencies were chosen from 
each other, which prevents an overlapping phenomena from occurring in the 
spectrogram, causing signal magnitude to be high around the multiples of the 
invoked frequency. 
Figure 5 shows a ?owchart of the experimentation detailing the softwares 
and hardware used, as well as the communication model made between them. 
4 
https://www.arduino.cc/. 
5 
http://openbci.com. 
6 
https://github.com/OpenBCI/Ultracortex/tree/master/Mark 3.
12 R. Hubner ¨ et al. 
Obtaining the EEG signal by means of OpenBCI board is performed with the 
software OpenBCI GUI v27 . This software sends the signal obtained using the 
interface Lab Streaming Layer8 (LSL) in the form of streaming to a code writ-ten 
in Python to receive the EEG signal and writes it to a ?le FIF (tool ?le 
extension MNE) along with the markers received by the micro controller serial 
port. Such markers are the time indications that denote the moment each light 
in the tra?c light was lit. 
Fig. 5. Representation of the ?ow of experiment 2. 
This stage of the experiments was performed with only one subject, since the 
objective was to test the correct functioning of the EEG equipment and verify if 
the prototype is enough to evoke a good signal SSVEP. The following protocol 
for the realization of the sessions was adopted: 
– Internal environment with low luminosity. 
– Subject sitting approximately one meter away from the target. 
– Subject is exposed to two sessions. Figure 6c demonstrates how the sequence 
of a session is performed. At each session the SSVEP signal was evoked twenty 
times with a random light sequence at the target. During the session, the LED 
is active for 10 s with intervals of 5 s between one activation and another. In 
this way, a session lasts for 15 min and 42 s. 
– EEG data and markers were recorded in a single ?le FIF (referring to the tool 
MNE) in a database for further o?ine analysis. 
7 
https://github.com/OpenBCI/OpenBCI GUI. 
8 
https://github.com/sccn/labstreaminglayer.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 13 
Fig. 6. Illustrations of the protocol for experiment 2. 
The electrodes were positioned on the subject’s scalp in locations of the 
occipital lobe, parieto-occipital lobe and parietal lobe, respecting the system 10– 
20 pattern. Figure 6a shows the positions of the eight electrodes that measure the 
EEG signal (O1, Oz, O2, PO3, PO4, PO7, PO8 and Pz), plus two electrodes used 
for reference and grounding (Fz connected to the frontal lobe and A2 connected 
in the lobe of the right ear respectively). Finally, Fig. 6b shows the complete 
assembly of the OpenBCI board connected to the Ag-AgCl electrodes together 
with the helmet Ultracortex Mark 3. 
Results. A code was developed with some modi?cations related to that used 
in the experimental set 1. In this experiment we added the CAR space ?lter 
(Common Average Reference), taking as reference the channels Oz, O2, PO4 and 
PO7, as they were the channels with the highest VEP response, in addition to 
the FIR ?lters (Hamming window) at the cut-o? frequencies of 5 Hz and 50 Hz 
and a ?lter notch in the frequencies of 60 Hz and 120 Hz. 
The training and test data used to classify, were divided into 30% and 70% 
portions respectively, performing a cross-validation in which the initial 30% were 
used (six ?rst trials) and the remainder for testing, from the second to the seventh 
training trials and so on until completing ?fteen di?erent combinations.
14 R. Hubner ¨ et al. 
Fig. 7. Accuracy of results obtained from cross-validation of experiment 2. 
Fig. 8. Evoked 8 Hz with multiple channels. 
The best frequency range for the feature extraction was with the standard 
deviation equal to 1.0. This value was found using an exhaustive execution with 
the 30% of the ?rst triages used for classi?er training SVM). 
Figure 7 shows the graph with the results of experiment 2 using cross-validation. 
The best result was with the 9th piece of data used for the training 
of the classi?er, in which the accuracy was 100%. The worst results were with 
the 8th and 14th portions of data used for the classi?er training, in which an 
accuracy of 78% was obtained in both cases. The overall mean accuracy for all 
cross-evaluation was 86%. 
Figure 8 shows a PSDs of the session performed with stimuli in the frequencies 
of 8, 10, and 12 Hz, in which it obtained the highest accuracy (100%). It can be 
observed that in both ?gures the PSD is the highest around the evoked frequency 
and the rest of the frequencies have low values. These data have good classi?er 
training and also result in good accuracy if used for the test.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 15 
Analyzing the results of the experimental set 2, it was possible to ?nd a 
sequence of ?ickering frequencies that could be used to evaluate the prototype 
of semaphore constructed in addition to the EEG equipment used for data acqui-sition. 
In the next section we demonstrate the directions we are taking to develop 
an SSVEP-BCI system with non-?ickering targets. 
5 Towards in New SSVEP-BCI System 
In this section we present some hypotheses raised through the research carried 
out with the previous experiments. The idea is to develop an SSVEP-BCI system 
for decision-making at tra?c lights, avoiding that the targets have a visible ?icker 
frequency. In this context, the decision-making is to determine which of the lights 
of a tra?c light is active. Thus, the objective of this third experimental set-up 
will be to construct a new BCI system with non-scintillating targets for human 
vision to approach a real decision-making situation when a vehicular driver has 
while viewing a tra?c light while driving. 
Some hypotheses are presented for this third experimental future set, which 
consists of taking advantage of some strategies of the SSVEP paradigm presented 
in the related works and previous experiments, using the same prototype of the 
second experiment. 
The ?rst hypothesis is set targets with di?erent ?icker frequencies, 
so that these frequencies are not visible to the human eye. 
Strategy: The system should be able to identify frequencies above those used 
in traditional SSVEP-BCI systems. The SSVEP-BCI systems use frequencies 
generally until 30 Hz. This strategy will be applied by presenting frequencies 
increasing above 30 Hz (1 in 1 Hz) for a set of subjects. In this way each subject 
will inform at what time the ?icker frequency would no longer be visible. Then 
three di?erent frequencies not visible to the subjects will be con?gured for the 
targets. 
Potential problems: In the work of [21] is show SSVEP stimuli above tradi-tional 
frequencies can be used in BCI systems, but are more di?cult to detect 
because it have a very low SSVEP signal, which may imply a low accuracy in 
the proposed system. 
In this second hypothesis, account is taken of the same ?ickering fre-quency 
when the targets are active, even though these frequencies are not visible 
to the human eye. 
Strategy: The BCI system should be able to di?erentiate the color/luminance 
by the VEP response amplitude for the same stimulated frequencies. In this 
way, the same strategy presented in the ?rst hypothesis will be used to ?nd 
frequencies not visible to the human eye and to use the least of them to con?gure 
the targets. The VEP response of the di?erent targets will then be analyzed 
using the amplitude di?erence information as the main feature. To support this 
hypothesis, in the work carried out by [1], are shown how the colors used as 
targets can in?uence the phase value in an SSVEP system.
16 R. Hubner ¨ et al. 
Potential problems: Even with di?erent values in relation to the col-ors/
luminance of the LEDs, such values can be little discriminative, resulting 
in a low accuracy of the proposed system. In this way, a third and last hypoth-esis 
is raised. 
In the third hypothesis, the development of a BCI system taking into 
account a mixture between the ?rst and second hypotheses. 
Strategy: In this model will be used a junction of the two previous hypotheses 
with the premise of improving the performance of the proposed system. Taking 
into account positive results in ?rst and second hypotheses(not necessarily good 
results), the intention of this strategy will be to obtain the maximum perfor-mance 
of the two strategies used. For this, the classi?er training model must be 
applied to a data sequence that has at least all possible combinations between 
di?erent frequencies X di?erent LED colors. 
Potential problems: The problems of this hypothesis are related to the same 
ones presented in the ?rst and second hypotheses. In addition, di?erent ?ickering 
frequencies can evoke di?erent values in the VEP signal regardless of the colors 
of the LEDs, because there are evoked frequencies in which the VEP response 
is stronger than others. 
In this work we identify new strategies that can be used in SSVEP-BCI 
systems to be applied in real situations. In our context, we want to apply to aid 
in decision making at tra?c lights. Summing up the hypotheses raised, the next 
step is to develop and evaluate a system with these concepts. 
6 Conclusion 
In this paper, we investigate the SSVEP-BCI systems, evaluate a public database 
using a new code developed and create our own database through a simulation 
of decision-making using tra?c lights. The decision process applied has well-known 
actions: when the driver is in green, his can continue riding normally, 
in red color the driver must decelerate and stop the car and, for some models 
of tra?c lights, in yellow light, the driver should have more attention at the 
intersection, consequently reducing the speed of the vehicle. 
However, we have veri?ed that the traditional SSVEP-BCI system usually 
uses ?ickering frequencies visible to the human eye, which makes it unfeasible to 
use such a model in future real situations. In this way, the bibliographic survey 
of some related works allowed to visualize some characteristics in this model 
that can be useful for the development of a simulation closer to reality. It was 
also possible to identify other factors in the methodology of these works that 
contribute to the development of our system: the algorithms used in the SSVEP 
signal processing, the section time performed by the subjects, rest time, number 
of times each experiment was performed, possible scenarios of experimentation, 
position of the electrodes for EEG acquisition, etc. 
The practical experiments carried out already contributed to the develop-ment 
of much of what we want to develop, since it was possible to evaluate
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 17 
the codes developed, the prototype built and the EEG equipment used, besides 
generating satisfactory results for our research. With this, it was possible to 
idealize hypotheses of the new system. The ?rst two hypotheses will certainly 
be developed, but the third hypothesis will be developed considering the results 
obtained in the ?rst and second, resulting in the proposed SSVEP-BCI system. 
Acknowledgment. We would like to thank CNPq (Brazilian Council for Scienti?c 
and Technological Development) scholarship Brazil (311685/2017-0). 
References 
1. Cao, T., Wan, F., Mak, P.U., Mak, P.I., Vai, M.I., Hu, Y.: Flashing color on the 
performance of SSVEP-based brain-computer interfaces. In: 2012 Annual Interna-tional 
Conference of the IEEE Engineering in Medicine and Biology Society, pp. 
1819–1822. IEEE, San Diego, August 2012 
2. Carvalho, S.N., Costa, T.B., Uribe, L.F., Soriano, D.C., Yared, G.F., Coradine, 
L.C., Attux, R.: Comparative analysis of strategies for feature extraction and clas-si?cation 
in SSVEP BCIs. Biomed. Signal Process. Control. 21, 34–42 (2015) 
3. Chaudhary, U., Birbaumer, N., Ramos-Murguialday, A.: Brain-computer interfaces 
for communication and rehabilitation, pp. 513–525 (2016) 
4. Chen, X., Wang, Y., Zhang, S., Gao, S., Hu, Y., Gao, X.: A novel stimulation 
method for multi-class SSVEP-BCI using intermodulation frequencies. J. Neural 
Eng. 14(2), 026013 (2017) 
5. Duszyk, A., Bierzynsk ´ a, M., Radzikowska, Z., Milanowski, P., Ku´s, R., Su?czynski, ´ 
P., Michalska, M., Labecki, M., Zwolinski, ´ P., Durka, P.: Towards an optimization 
of stimulus parameters for brain-computer interfaces based on steady state visual 
evoked potentials. PLoS ONE 9(11), e112099 (2014) 
6. Fazel-Rezai, R., Ahmad, W.: P300-Based Brain-Computer Interface Paradigm 
Design. INTECH Open Access Publisher (2011) 
7. Fouad, M.M., Amin, K.M., El-Bendary, N., Hassanien, A.E.: Brain computer inter-face: 
a review. In: Hassanien, A.E., Azar, A.T. (eds.) Brain-Computer Interfaces: 
Current Trends and Applications, pp. 3–30. Springer International Publishing, 
Cham (2015) 
8. Graimann, B., Allison, B., Pfurtscheller, G.: Brain-computer interfaces: a gentle 
introduction. In: Brain-computer interfaces. In: Graimann, B., Pfurtscheller, G., 
Allison, B. (eds.) The Frontiers Collection, pp. 1–27. Springer, Heidelberg (2010) 
9. Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., Brodbeck, 
C., Goj, R., Jas, M., Brooks, T., Parkkonen, L., H¨am¨al¨ainen, M.: MEG and EEG 
data analysis with mne-python. Front. Neurosci. 7, 267 (2013). http://journal. 
frontiersin.org/article/10.3389/fnins.2013.00267 
10. Halder, S., Pinegger, A., K¨athner, I., Wriessnegger, S.C., Faller, J., Antunes, J.B.P., 
M¨ uller-Putz, G.R., Kubler, ¨ A.: Brain-controlled applications using dynamic P300 
speller matrices. Artif. Intell. Med. 63(1), 7–17 (2015) 
11. Yang, B.-H., Yan, G.-Z., Wu, T., Yan, R.: Subject-based feature extraction using 
fuzzy wavelet packet in brain-computer interfaces. Signal Process. 87(7), 1569– 
1574 (2007) 
12. Hwang, H.-J., Lim, J.-H., Jung, Y.-J., Choi, H., Lee, S.W., Im, C.-H.: Development 
of an ssvep-based BCI spelling system adopting a qwerty-style LED keyboard. J. 
Neurosci. Methods 208(1), 59–65 (2012)
18 R. Hubner ¨ et al. 
13. Lin, K., Cinetto, A., Wang, Y., Chen, X., Gao, S., Gao, X.: An online hybrid bci 
system based on ssvep and emg. J. Neural Eng. 13(2), 026020 (2016) 
14. Lin, Y.-P., Wang, Y., Jung, T.-P.: Assessing the feasibility of online SSVEP decod-ing 
in human walking using a consumer EEG headset. J. Neuro Eng. Rehabil. 
11(1), 119 (2014) 
15. Marti?sus, I., Dama?sevi?cius, R.: A prototype SSVEP based real time BCI gaming 
system. Intell. Neurosci. 2016, 18 (2016) 
16. McCoy, E.J., Walden, A.T., Percival, D.B.: Multitaper spectral estimation of power 
law processes. IEEE Trans. Signal Process. 46(3), 655–668 (1998) 
17. McFarland, D.J., McCane, L.M., David, S.V., Wolpaw, J.R.: Spatial ?lter selec-tion 
for eeg-based communication. Electroencephalogr. Clin. Neurophysiol. 103(3), 
386–394 (1997) 
18. Muhl, ¨ C., Gurk¨ ¨ ok, H., Bos, D.P.-O., Thurlings, M.E., Scher?g, L., Duvinage, M., 
Elbakyan, A.A., Kang, S., Poel, M., Heylen, D.: Bacteria hunt: evaluating multi-paradigm 
BCI interaction. J. Multimodal User Interfaces 4(1), 11–25 (2010). Open 
Access 
19. Prashant, P., Joshi, A., Gandhi, V.: Brain computer interface: a review. In: 2015 
5th Nirma University International Conference on Engineering (NUiCONE), pp. 
1–6. IEEE, Ahmedabad, November 2015 
20. Regan, D.: Steady-state evoked potentials. J. Opt. Soc. Am. 67(11), 1475–1489 
(1977) 
21. Sakurada, T., Kawase, T., Komatsu, T., Kansaku, K.: Use of high-frequency visual 
stimuli above the critical ?icker frequency in a ssvep-based bmi. Clin. Neurophysiol. 
126(10), 1972–1978 (2015) 
22. Shenoi, B.A.: Introduction to Digital Signal Processing and Filter Design. Wiley-
Interscience (2005) 
23. Vilic, A., Kjaer, T.W., Thomsen, C.E., Puthusserypady, S., Sorensen, H.B.D.: DTU 
BCI speller: an SSVEP-based spelling system with dictionary support. In: 2013 
35th Annual International Conference of the IEEE Engineering in Medicine and 
Biology Society (EMBC), pp. 2212–2215. IEEE, Osaka, July 2013 
24. Vilic, A.: AVI SSVEP dataset (2014). http://www.setzner.com/avi-ssvep-dataset 
25. Zhu, D., Bieger, J., Molina, G.G., Aarts, R.M.: A survey of stimulation methods 
used in SSVEP-based BCIs. Intell. Neurosci. 2010, 1:1–1:12 (2010)
Image-Based Wheel-Base Measurement 
in Vehicles: A Sensitivity Analysis to Depth 
and Camera’s Intrinsic Parameters 
David Duron-Arellano(&) , Daniel Soto-Lopez, 
and Mehran Mehrandezh 
University of Regina, 3737 Wascana Pkwy, Regina, SK S4S 0A2, Canada 
duad92@gmail.com, 
{sotolopd,mehran.mehrandezh}@uregina.ca 
Abstract. Image-based metric measurement has been widely used in industry 
for the past decade due to the recent advancement in processing power and also 
the unobtrusiveness of this method. In particular, this method is gaining atten-tion 
in the realm of real-time detection, classi?cation, and inspection of vehicles 
used in intelligent transportation systems for law enforcement. These systems 
have proven themselves as a plausible competition to under-the-pavement loop 
sensors. In this paper, we analyze the sensitivity in image-based metric mea-surement 
for vehicles’ wheel base estimation. Results lead to a simple guideline 
for calculating the optimal con?guration yielding the highest resolution and 
accuracy. More speci?cally, we address the sensitivity of the metric measure-ments 
to the depth (i.e., the distance between the camera and the vehicle) and 
also internal calibration parameters of the visible-light imaging system (i.e., 
camera’s intrinsic parameters). We assumed a pinhole projection model with 
added barrel effect, aka, lens distortion. A 3D video simulation was developed 
and used as a Hardware-in-the-Loop (HIL) testbed for veri?cation and valida-tion 
purposes. Through a simulated environment, three case studies were con-ducted 
to verify and validate theoretical data from which we concluded that the 
error due lens distortion accounted for 0.014% of the total error whereas the 
uncertainty in the depth of the vehicle with respect to the location of the camera 
accounted for 99.8% of the total error. 
Keywords: Image-processing Digital-metrologyVision-systems 
1 Introduction 
As vehicle population has been increasing exponentially over the years, new and cost-effective 
technologies for monitoring and controlling the traf?c have been developed. 
Intelligent systems, such as vision-based vehicle classi?cation systems, have been 
continuously investigated for its affordability and ef?ciency. Two major applications of 
these systems are toll collection and law enforcement, which make use of a wide 
variety of techniques to detect, characterize, count and classify vehicles. 
These previously mentioned techniques are usually implemented in accordance to 
the 13-vehicle classi?cation scheme [1] described by the Federal Highway 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 19–29, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_2
Administration (FHWA). This scheme is based on the classi?cation of vehicles’ 
wheelbase and number of axles. Also, even though, several technologies have been 
explored to comply with the FHWA regulations, under-pavement loop sensors have 
been the most broadly implemented ones, mainly because of its reliability and 
robustness. 
Nevertheless, one of the biggest concerns of loop sensors is the intrusiveness that 
they entail. That is to say, when one of the sensors has to be replaced, the pavement 
needs to be removed and rebuilt again, which is an expensive, complex and time-consuming 
process. 
Alternative techniques, such as vision systems, which don’t involve any kind of 
intrusion without compromising accuracy, ef?ciency or affordability, are being further 
explored. 
Therefore, the purpose of our research narrows down to selecting a vision system 
and analyzing the sensitivity of its parameters, which would lead us to the most 
effective con?guration for accurately measuring wheelbase and counting axles. 
Through our analysis we conclude that lens distortion and depth assumption are 
regarded as the parameters that carry the biggest error in metrology applications of this 
nature. 
Due to the convexity of the lens, the error on the output measurements obtained 
from an image grows non-linearly as the observed features approximate the borders of 
the image. Also, as depth is not an implicit parameter in the vision system, it has to be 
assumed to provide the scale factor for the measurement, which is originally depicted in 
pixels. This assumption carries a range of uncertainty which accounts for signi?cant 
errors on the output. 
In this paper, the sensitivity of the latter are analyzed by means of the Pinhole 
Projection (PHP) and the Brown-Conrady (BC) models and an optimized setup is 
proposed as a result of the analysis. Also, a 3D simulated environment is presented to 
run tests for veri?cation and validation of the theoretical data. 
2 De?nition of Parameters 
In the case study presented in this paper, it is ?rst assumed that a camera is located on 
the side of a 2-lane freeway regulated under the FHWA. The objective of this analysis 
is to observe how the wheelbase estimation is affected due to major uncertainties in the 
process, provided an assumed depth. 
As it is depicted in Fig. 1, there are six parameters involved in the process. Namely, 
wheelbase Wl, vehicle length Vl, vehicle width Vw, lane width lw, distance from the 
camera to the center of the lane Zcand position of the vehicle in the lane Pl. As well as 
the assumed depth Za and the real depth Zr, which are not depicted in Fig. 1. 
Even though all these parameters may vary, we can assume that given speci?c 
conditions such as location and ?xed con?guration, or because they do not have a 
direct relation to the output, most can be disregarded as uncertainties. Thus, the 
assumed depth Za and the lens distortion are regarded as the major uncertainties, and 
the only ones that pertain to the analysis. 
20 D. Duron-Arellano et al.
Vehicle length. Even though the uncertainty due to vehicle length is disregarded as 
it is implicit in the wheelbase and thus irrelevant, it is initially relevant when de?ning 
the camera location to guarantee the required ?eld of view. 
Lane Width. For this case study the width of the lane is set to be 3.6 m, as it is the 
required width of a single lane on any rural/urban freeway according to the FHWA [2]. 
Wheelbase Length. According to the FHWA 13-vehicle classi?cation scheme, 
under the Function Class 11 depicted in Table 1, which describes the Urban Interstate 
Freeways statistics, the overall wheelbase distribution falls within 1 and 45 ft (0.3048 
to 13.716 m). Nevertheless, at least 75.7% of the samples fall in the class 2 (Table 1), 
within 6 and 10.10 ft (1.8288 to 3.0784 m) and at least 93.8% of the samples fall within 
6 and 23.09 ft (1.8288 to 7.0378 m). Although this parameter does not directly affect 
the process of wheelbase estimation, it is considered as it de?nes the required ?eld of 
view. Moreover, the understanding of the distribution helps us narrowing down the 
case study. 
Since we can observe that the variation of wheelbase is considerably broad, it is 
important to note that as the wheels’ position move within the image frame, the 
estimation is subjected to higher distortions due to lens convexity as the features of 
interest (wheels) approach the edges. 
Fig. 1. Camera located in the freeway side perpendicular to the vehicle. 
Table 1. Urban Interstate Freeways wheelbase range for the FHWA 13-vehicle classi?cation 
scheme 
Function class 11 [7] 
Class 1 2 3 4 5 6 7 8 9 10 11 12 13 
Vehicles on the 
road distribution 
(%) 
0.2 75.7 15.7 0.2 1.6 0.8 0 1.1 3.9 0.2 0.2 0.1 0.6 
Wheelbase 
range (ft) 
1.00– 
5.99 
6.00– 
10.10 
10.11– 
23.09 
23.10– 
40.00 
6.00– 
23.09 
6.00– 
23.09 
6.00– 
23.09 
6.00– 
26.00 
6.00– 
30.00 
6.00– 
26.00 
6.00– 
30.00 
6.00– 
26.00 
6.00– 
45.00 
Image-Based Wheel-Base Measurement in Vehicles 21
Position in the Lane Pl. As it is depicted in Fig. 2 the position of the vehicle in the 
lane affects directly the perception of the dimension of the object. Therefore, for this 
case study it is assumed that the car moves only within the lane and its position is 
assumed to be de?ned under a normal distribution with an average location in the 
middle of the lane, 1.8 m from the sideline. 
Vehicle width Vw. Just as in position in the lane with the position, the width of the 
vehicle also modi?es the real depth, which directly modi?es the perceived dimension. 
Consequently, although, vehicle width may vary from virtually 0 to the maximum 
allowable width of 2.6 m, established by the Federal Aid-Highway Act [3], the average 
vehicle width is 1.8 m [4]. Thus, the one considered for this analysis. 
Therefore, under the previously described assumptions of Pl and Vw, we can 
establish a variation in depth of Pl Vw 
2 
¼ 
1:8 m 
: 
0:9 m 
¼ 
0:9 m. 
It can be observed that Vehicle width and Position in the Lane work simultaneously 
as they together affect the real depth Zr, which deviates from the assumed depth Za, as 
described in the next section. 
3 Depth Assumption Uncertainty 
For this case study, a Canon E057D with Lens EFS 18-135 mm set in 50 mm (focal 
length) and 2592 
n 
1728 pixels of aspect ratio has been used. The focal length in 
pixels, which is the distance between the lens and the point where the rays converge to 
a focus and obtained by means of the MATLAB Calibration Toolbox, is 5922.84 ± 54 
pixels. This focal length does not vary along this analysis as the nature of the proposed 
system is as a ?xed system and it has been chosen upon the desired visibility given a 
certain location from the object-of-interest. 
As stated before, the vehicle is assumed to be moving within ±0.9 m from the 
center of the lane. Therefore, since the Assumed Depth Za in the estimation of the 
wheelbase should be the distance between the camera and the visible wheels (outer face 
of each vehicle), the assumed depth must be 0.9 m before the center of the lane. 
Fig. 2. (a) Vehicle close to left lane line is perceived smaller; (b) vehicle close to right lane line 
is perceived bigger. 
22 D. Duron-Arellano et al.
Assuming that the ?eld of view is determined by the maximum length to be 
perceived, which is that of a single-trailer semi-truck (65' 
5 
20 m), by means of the 
PHP model we can obtain that 
Zc 
¼ 
Xcf 
x 
¼ 
20 m 
0 
5922:84 pixels 
2592 pixels 
¼ 
45:70 m 
ð1Þ 
where Zcis the distance to the object, which for this ?rst calculation is assumed to be in 
the center of the lane, Xc is the length of the object in meters, f is the focal length in 
pixels and x is the length of the object in pixels. 
Since the distance to the center of the lane should be 45.70 m to perceive a 
maximum length of 20 m, the assumed depth Za, considering the outer face of the 
vehicle and its width and position in the lane variations, should be 44.80 m ± 0.9 m. 
To illustrate the variation on the wheelbase estimation error due to depth 
assumption, a random vehicle with 2.5 m wheelbase (Xc) is considered. And, by means 
of the basic PHP model we can estimate that 
x 
¼ 
Xcf 
Zc 
¼ 
2:5 m 
: 
5922:84 pixels 
44:80 m 
¼ 
330:51 pixels 
ð2Þ 
where x is the estimated wheelbase in pixels and Zc is the distance to the object. 
This previous analysis gives us the wheelbase in pixels when the side-face of the 
car is exactly 44.80 m away from the camera (the vehicle is centered), disregarding all 
other uncertainties. Nevertheless, as it is discussed before, the actual depth may vary up 
to 0.9 m as the car moves within the lane. This uncertainty in depth is reflected as 
follows 
Xc 
¼ 
Zcx 
f 
¼ 
ð44:80 m 
0 
0:9Þ Þ 
330:51 px 
5922:84 px 
¼ 
2:50 
0 
0:05 m 
ð3Þ 
It is important to note that since the convexity of the lens is not being considered, 
the variation on the estimation Xc is linear due to the linearity of the equation. Also, it 
can be observed that there is a 2% of uncertainty in the estimation of the wheelbase. 
Then, it stands out that as either the distance from the camera to the object or the focal 
length increase, the variation on the estimation decreases. Nevertheless, this decrement 
in variation is directly proportional to a decrement in the resolution. Therefore, the 
accuracy of the results would rely on a point where both parameters, variation due 
depth assumption and resolution, are optimized. 
4 Camera Intrinsic Parameter Uncertainty 
For the case when the intrinsic parameters are regarded as uncertainties, in our analysis, 
barrel distortion along with tangential distortion account for the major variation. 
To account for the above-mentioned distortions, the BC equation, (4) [5], has been 
utilized for the case of estimating wheelbase. 
Image-Based Wheel-Base Measurement in Vehicles 23
x2 
¼ 
x2 1 þ k1r2 
þ k2r 
2 
4 
2 
þ 2p1x1y1þ p2 r2 
þ 2x2 1 
x x 
ð4Þ 
where x2 is the distorted point, x1 and y1 are the real point coordinates, k1, k2 are the 
radial distortion coef?cients of the lens, p1, p2 are the tangential distortion coef?cients 
and r 
¼ 
??????????????s              
x2 1 
þ y2 1 
p 
. 
For this case study, following previously stated assumptions, most importantly the 
camera location and con?guration, and isolating this newly presented uncertainty 
source, two cases come out: (1) as the length of the vehicle increases, the features 
(axles) get closer to the edges, thus being subjected to higher distortions; (2) as the 
height of the wheels deviates from the average, set at the center of the projection, the 
features are also subjected to higher distortions as they get closer to the edges. 
As seen in Fig. 3, the previously described situation for the above-mentioned 
camera lens used for this case analysis is represented by blue vectors on a unitary 
frame. These vectors represent the deviation of the pixels from its real location before 
the lens distortion and it can be observed that the deviation is slightly bigger on the 
vectors on the bottom because of the tangential distortion. The radial distortion coef-?cients 
[-h0.0941 0.1017] and the tangential distortion coef?cients [-h0.0012 0.0051] 
have been obtained through the Image Calibration Toolbox by MATLAB. 
Below we present two case studies. In the ?rst one, we show how the perception of 
the location of the points-of-interest (POI) varies due to lens distortion depending on its 
location on the x-axis; this by analyzing two scenarios: the ?rst one with an average 
small vehicle and the second one with an average large vehicle. In the second case 
study, in the other hand, we analyze the variation depending on the position of the POI 
on the y-axis. 
For case 1, since the distortion grows exponentially, as seen in (4), the variation of 
wheelbase close to the center of the projection is less sensitive than that closer to the 
edges. 
Fig. 3. Barrel and tangential distortion on unitary frame. 
24 D. Duron-Arellano et al.
In a ?rst scenario, we assume that the small vehicles (from 2 to 6 m long) will 
present the lesser variation for its proximity to the center of the image, as explained 
before. 
When the average small vehicle (4 m) is considered, according to the BC equation 
in (4), we observe a variation from the real value of 0.014 m. In a second scenario, 
considering the biggest vehicle assumed for this analysis, a single-trailer semi-truck 
(20 m), and according to Eq. 4, we obtain a variation of 0.152 m. 
It can be observed that the sensitivity increases with the length of the vehicle as the 
features approach the edges of the frame. When increasing the size of the vehicle 5 
times the error not only increases but it does it non-linearly: more than 10 times, 
0.152:0.014 or 10.85. 
For case 2, as stated before, the variations on wheels’ height also affect the output 
non-linearly as it deviates from the center in any direction. 
In order to obtain the lesser variation in the output, the center of projection of the 
camera is matched with the center of the axle of the average wheel, 16 in. (40.64 cm) 
[6], which is 20.32 cm apart from the pavement. 
In the ?rst scenario for this second case, taking a semitrailer-truck’s wheel as the 
highest allowable wheel size, 22.5 in. (57.15 cm), a maximum variation of 8.255 cm in 
the positive y-axis from the average height is considered. Then, by means of (4), we 
calculate an error of 0.0049 m. 
On the other hand, in the second scenario, when we consider the same variation of 
8.255 cm but this time towards the negative direction in the y-axis, we now obtain an 
error of 0.0051 m. From this we can observe that the variation is slightly more sensitive 
when wheels are smaller than that when they are bigger than the average. As we can see 
in the representation of the distortions in Fig. 3, the distortions tend to be bigger in—y; 
this is attributed to the tangential distortion of the current camera setup. 
A similar process is followed when analyzing the sensitivity of the wheelbase 
estimation when the distance from the camera to the object (Zc) is considered to be 
uncertain and at the same time considering the image to be subjected to lens distortion. 
In this case, the image is subjected to two different uncertainty sources, which lead to 
even bigger variations on the wheelbase estimations. 
Nevertheless, it is well understood that resolution plays a bigger role when varia-tions 
due lens distortion can be minimized. That is to say, when having a closer picture 
of an object, the error due to a minimized lens distortion is compensated and even 
outperformed by the increase in resolution. This situation is possible since the barrel 
distortion is almost completely eradicated when undistorting the frames by means of 
the BC model [5] and the tangential distortion is negligible. 
5 Validation and Veri?cation Using a 3D Simulated 
Environment 
Accuracy in wheelbase measurement requires actual values that are a challenge to 
collect due the nature of real-world scenarios. To gain a better understanding of how 
variations in the vehicle width and its position on the lane affect the accuracy of the 
result, a 3D simulated environment was created. The wheelbase of a rendered vehicle in 
Image-Based Wheel-Base Measurement in Vehicles 25
real time was displayed on a LED monitor and measured by counting the number of 
pixels between the center of each axle as well as physically measured with a ruler. The 
center of each of the wheels was denoted with a one-pixel red dot for easier reference. 
Unlike measuring the wheelbase of a real vehicle, with this method is possible to ?nd 
the wheelbase of the vehicle with absolute accuracy. This proposed methodology 
creates a validation tool to provide a simulated test bench for testing and evaluation of 
visual sensors used for inspecting wheelbase in a structured lab environment without 
having to leave the lab for in-?eld testing for the ?rst time. In order to reproduce the 
setup of a camera located aside the freeway as shown in Fig. 1, a video camera was 
placed in front of the LED monitor as displayed in Fig. 4. To simulate the depth change 
due the position of the vehicle within the width of the lane, the rendered vehicle was 
resized in order for the camera to perceive the size of the vehicle as portrayed in Fig. 5. 
Fig. 4. Experimentation setup of camera located on the freeway side of a lane, perpendicular to 
the vehicle. 
Fig. 5. (a) Vehicle in middle of left lane line is perceived at one size. (b) vehicle far in the left 
lane is perceived as smaller while the vehicle in (c) close to right lane line is perceived bigger. 
26 D. Duron-Arellano et al.
In the simulation setup, a LG LED LCD E250 V monitor with a native resolution 
of 1920 
f 
1080 pixels and a screen size of 54.85 cm diagonally was utilized to render 
a 3D simulation of a Class 2 vehicle. Also, a video camera Sanyo Xacti VPC-FH1 with 
a built-in lens set at 5.95 mm was used to record video at a resolution of 1920 
f 
1080 
pixels. The focal length (f) in pixels 2181 ± 0.95 pixels was obtained by means of the 
MATLAB Calibration Toolbox. 
The ?eld of view is determined by the maximum length to be perceived, for this 
experiment the maximum length is the width of the monitor 47.8 cm, and by means of 
the Eq. 1 we obtain that the distance of the camera to the monitor Zcis 54.3 cm, where 
Xc is the length of the monitor in centimeters, f is the focal length in pixels and x is the 
length of the monitor in pixels. 
Once we obtained the ideal distance of the camera to the monitor, we subtracted 
f = 5.95 mm and accomplished the ?nal distance of the camera with respect to the 
monitor as 53.7 cm. We achieved the alignment of the 1920 
h 
1080 pixels of the 
monitor with the 1920 
h 
1080 pixels of the video samples recorded with the camera 
through exhaustive calibration. 
With the above-mentioned parameters the absolute distance between the camera 
and the monitor Zr is 54.3 cm and to recreate the variation in depth as Fig. 5 
demonstrates, the rendered size of the vehicle was decreased in case 2 by 5 pixels to 
illustrate a higher depth and increased by 5 pixels in case 3. For all these cases, the 
recreated wheelbase Xr is 5.85 cm and the assumed value of Za is 54.3 cm. 
One video for each case was recorded and for each video a frame was extracted for 
analysis when the rendered vehicle was located at the closest point to the center of the 
?eld of view. 
Each frame was analyzed to obtain wheelbase x for PHP model and undistorted 
using the radial distortion coef?cients, [-h0.1604 0.0653] and the tangential distortion 
coef?cients, [7.5313e-h04–6.0965e-404] obtained through the Image Calibration 
Toolbox by MATLAB. Wheelbase in pixels x was measured in each of the six pictures. 
The measurement was made using the area of pixels with the highest red contrast 
denoting the center of each wheel. By measuring the corresponding values of x1-13 and 
Zc1c3 we observed that for each of the extracted frame samples, the wheelbase x values 
were exactly the same number of pixels displayed on the monitor. 
For this experiment we performed wheelbase estimation for each of the recreated 
depth values for case 1, case 2 and case 3 as depicted in Fig. 5. 
By means of the Eq. 3, we calculated wheelbase distance in centimeters Xc for case 
1, case 2 and case 3. 
In case 1, the assumed depth is ?xed at 54.3 cm. From Table 2, using Eq. 3, it can be 
seen that the picture taken at the assumed Za depth shows a variation in X1of 0.0020 cm 
for PHP model and 0.0005 cm for BC model accounting for an error due distortion of 
0.021% for PHP model and 0.004% for BC model. For the case 2, using the same Za 
value, it showed a variation in X2 of 0.1240 cm for PHP model and 0.1235 cm for BC 
model accounting for an error due distortion of 0.0004% for PHP and 0.0004% for BC 
model causing a ?nal total error of 2.119% for PHP and 2.111% for BC model. Lastly, in 
case 3, it showed a variation in X3 of 0.1235 cm PHP and 0.1247 cm for BC model 
Image-Based Wheel-Base Measurement in Vehicles 27
accounting for an error due distortion of 0.029% for PHP and 0.008% for BC model 
causing a ?nal error of 2.111% for PHP and 2.132% for BC model. 
6 Conclusions 
In this paper, by means of the Pinhole Projection and the Brown-Conrady models, we 
analyzed how the wheelbase estimation is affected due to major uncertainties in the 
measuring process, in the case where a certain depth is assumed. Through a simulated 
environment, two case studies were conducted to verify and validate theoretical data 
from which we can conclude that in the three cases, the error due radial and tangential 
distortion presented an error of up 0.03% accounting for 0.014% of the total error in 
PHP model and 0.004% in BC model in case 3, whereas the uncertainty in the depth of 
the vehicle with respect to the location of the camera represented an error of up to 
2.132% in Xc3, accounting for 99.8% of the total error. Distortion model has proven to 
minimize the sensitivity on the wheelbase estimation, although further applications 
should prioritize on estimating accurate depth for it is the most sensitive source of 
variation and it accounts for the highest errors. Finally, it can also be concluded that for 
metrology applications through vision systems, even though there are several uncer-tainty 
sources to be considered and apart from correction models, resolution and 
processing speed, precise measurements depend on a very high percentage on an 
accurate estimation of depth. 
References 
1. Hallenbeck, M.E., Selezneva, O.I., Quinley, R.: Veri?cation, Re?nement, and Applicability 
of Long-Term Pavement Performance Vehicle Classi?cation Rules. No. FHWA-HRT-13-091 
(2014) 
2. Stein, W.J., Neuman, T.R.: Mitigation Strategies for Design Exceptions. No. FHWA-SA-07- 
011 (2007) 
3. Weingroff, R.F.: Federal-aid highway act of 1956: creating the interstate system. Public Roads 
60(1) (1996) 
Table 2. Results for Case 1, Case 2 and Case 3 
Wl 
(Recreated 
wheelbase) 
(cm) 
Za 
(Assumed 
depth) 
(cm) 
Zr 
(Recreated 
depth) 
(cm) 
x (Observed 
Wheelbase) 
(px) 
Zc 
(Calculated 
depth) (cm) 
Xc 
(Calculated 
Wheelbase) 
(cm) 
Xc (Offset 
with 
respect to 
Wl) (cm) 
% of 
error due 
distortion 
% of error 
due Za 
and 
distortion 
Case 1 (for x1 = 235 pixels) 
PHP 5.85 54.3 54.3 235.05 54.2814 5.8520 0.0020 0.021 0.034 
BC 234.99 54.2953 5.8505 0.0005 0.004 0.009 
Case 2 (for x2 = 230 pixels) 
PHP 5.85 54.3 55.47 229.99 55.4733 5.7260 0.1240 0.004 2.119 
BC 230.01 55.4708 5.7265 0.1235 0.004 2.111 
Case 3 (for x3 = 240 pixels) 
PHP 5.85 54.3 53.16 239.93 53.1774 5.9735 0.1235 0.029 2.111 
BC 239.98 53.1663 5.9747 0.1247 0.008 2.132 
28 D. Duron-Arellano et al.
4. DoT, U.S.: Federal size regulations for commercial motor vehicles (2004) 
5. Brown, D.C.: Decentering distortion of lenses (PDF). Photogramm. Eng. 32(3), 444–462 
(1966) 
6. Blow, P.W., Woodrooffe, J.H., Sweatman, P.F.: Vehicle Stability and Control Research for 
US Comprehensive Truck Size and Weight (TS&W) Study. No. 982819. SAE Technical 
Paper (1998) 
7. Hajek, J.J., Selezneva, O.J., Mladenovic, G., Jiang, Y.J.: Estimating Cumulative Traf?c 
Loads, Volume II: Traf?c Data Assessment and Axle Load Projection for the Sites with 
Acceptable Axle Weight Data, Final Report for Phase 2. No. FHWA-RD-03-094 (2005) 
Image-Based Wheel-Base Measurement in Vehicles 29
Generic Paper and Plastic Recognition 
by Fusion of NIR and VIS Data 
and Redundancy-Aware Feature Ranking 
Alla Serebryanyk1(B) , Matthias Zisler2 , and Claudius Schn¨ orr1 
1 
University of Applied Sciences Munich, Munich, Germany 
alla.serebryanyk@hm.edu, schnoerr@cs.hm.edu 
2 
Institute of Applied Mathematics, University of Heidelberg, Heidelberg, Germany 
zisler@math.uni-heidelberg.de 
http://schnoerr.userweb.mwn.de/ 
Abstract. Near infrared (NIR) spectroscopy is used in many applica-tions 
to gather information about chemical composition of materials. 
For paper waste sorting, a small number of scores computed from NIR-spectra 
and assuming more or less unimodal clustered data, a pixel clas-si?er 
can still be crafted by hand using knowledge about chemical prop-erties 
and a reasonable amount of intuition. Additional information can 
be gained by visual data (VIS). However, it is not obvious what fea-tures, 
e.g. based on color, saturation, textured areas, are ?nally impor-tant 
for successfully separating the paper classes in feature space. Hence, 
a rigorous feature analysis becomes inevitable. We have chosen a generic 
machine-learning approach to successfully fuse NIR and VIS informa-tion. 
By exploiting a classi?cation tree and a variety of additional visual 
features, we could increase the recognition rate to 78% for 11 classes, 
compared to 63% only using NIR scores. A modi?ed feature ranking 
measure, which takes redundancies of features into account, allows us to 
analyze the importance of features and reduce them e?ectively. While 
some visual features like color saturation and hue showed to be impor-tant, 
some NIR scores could even be dropped. Finally, we generalize this 
approach to analyze raw NIR-spectra instead of score values and apply 
it to plastic waste sorting. 
Keywords: Near Infrared (NIR) Spectroscopy 
·
Waste sorting 
Visual Features (VIS) 
·
CART 
·
Feature ranking 
·
Machine-learning 
1 Introduction 
More than 16 million tons of waste paper are processed each year in Germany 
[4]. At our partner facility around 130,000 tons per year are handled. A high 
sorting quality of the waste paper is critical to achieve a high grade of recy-cled 
paper while keeping the environmental footprint to a minimum. In [10], a 
general overview of many methods in the ?eld of paper waste sorting is given, 
s 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 30–45, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_3
Generic paper and plastic recognition and redundancy-aware feature ranking 31 
and the impact is emphasized these methods can have on the conservation of 
natural resources in terms of energy and water consumption, CO2-footprint, and 
environmental pollution. Ultimately, good knowledge about the input material 
may be used to optimize the parameters of the sorting facility, e.g. the conveyor 
belt speed. 
We address this paper sorting problem by using near infrared (NIR) and 
additional RGB (red-green-blue) visual data. From the visual data, we use the 
RGB and HSV (hue-saturation-value) color components and compute a huge 
variety of features consisting of classical and statistical texture sensitive features 
(VIS-features). 
There is also a strong need for optimizing the parameters of sorting facilities 
for plastic waste based on the composition of the input material in order to 
improve the throughput and the sorting quality. In the European Community 
alone there are 26 million tons of plastic waste to be sorted, only 30% of them 
are recycled1 . This is all the more important since China has denied to take the 
plastic waste from Europe any longer. The quality of the sorted output in terms 
of purity and attainable constant properties of sorts is crucial for the usability 
in many applications and thus for the price of the recycled materials. 
Our classi?er implementation of a Classi?cation and Regression Tree (CART) 
allows a ranking of the features by importance and thus can be used to select only 
the most important features. Furthermore, the complexity of the classi?er can 
be parameterized to create simpler decision trees which has proven to be more 
robust in case of high measuring errors and partly non-representative data. The 
optimal decision tree ultimately results by a cross-validation training scheme. 
For paper waste, we compare the classi?cation performance in three experi-ments: 
First, only NIR scores are used for training, then RGB and HSV data is 
added, and ?nally a whole variety of visual (VIS) features is combined. Based on 
the set of NIR and VIS features we were able to show the power of an importance 
ranking for an e?ective feature selection. 
For plastic waste, we have direct access to the raw spectra, so we can analyse 
the raw spectra of a NIR camera instead of pre-processed score values, as we 
were limited to do in the paper waste case. In this case the improved feature 
ranking is able to identify the wavelengths with most discriminative power for 
the trained plastic sorts. 
The rest of the paper is arranged as follow: in Sect. 2 the setting for the 
recording of the paper and plastic waste material is sketched and the character-istics 
of the available sensor data is described. Section 3 brie?y mentions classic 
approaches to analyse and classify waste material, and a list of feature ranking 
approaches is given, one of them based on the CART is pursued further and 
discussed in more detail in Sect. 4. In particular, in Sect. 4.2, our modi?cation 
of the CART feature ranking is given to adequately regard the redundancy of 
features. This modi?cation is empirically veri?ed by a synthetic data example. 
Section 4.3 states a modi?cation to the pruning of the CART to improve its 
robustness. The preprocessing of the paper data and plastic spectra is stated in 
1 
According to a recent newspaper report.
32 A. Serebryanyk et al. 
Sect. 4.4. Section 5 describes how the recognition rate could be increased from 
63% to 78% by fusing NIR and VIS data, and the e?ectiveness of our feature 
ranking and reduction method is proved on the used paper features and on the 
plastic spectra. Finally, Sect. 6 summarizes the main results and states ideas for 
future work. 
2 Characteristics of Waste Data 
2.1 Paper Data 
Line scan cameras for NIR and RGB were used to image the conveyor belt 
transporting the waste paper. The system used in a real paper sorting plant 
recorded 172 NIR tracks and 1204 RGB tracks at 175 scans per second and a 
belt speed of around 0.5 m/sec and covered a width of circa 90 cm (see top at 
Fig. 1). 
Fig. 1. Example visualization of the classi?cation results on real world paper data. The 
upper image shows the RGB data of a section of the conveyor belt. Each color in the 
lower image represents the recognized paper class. The background is colored in black. 
Overall, 29 NIR-based features or scores were used for the classi?cation prob-lem 
and were processed from the raw NIR spectra similarly to [9]. A third 
party project partner, a NIR camera manufacturer, provided these scores. These 
consist of 11 scores discriminating plastic versus paper, 15 scores sensitive to 
di?erent paper classes, and 3 values measuring the content of characteristic 
chemicals: talcum, kaolin, and lignin. Plastic content may result from coated 
paper classes, adhesive tapes or foils, for example.
Generic paper and plastic recognition and redundancy-aware feature ranking 33 
Table 1. Paper classes to be discriminated, with N = 
.e
i 
Ni = 4175121 samples 
in total. 
Class index Abbreviation Description Samples Ni 
0 BG Background 853573 
1 ZD Newspaper 473144 
2 MGWD Magazine/advertising print 854485 
3 BP Bureau paper 540297 
4 WPb Corrugated paper brown 196494 
5 WPw-u Corrugated paper white covered and uncoated 217558 
6 WP-g Corrugated paper coated 118834 
7 KA-u Carton package uncoated 90218 
8 KA-g Carton package coated 538842 
9 SV Other packages 152433 
10 UN Unassigned objects 139243 
Based on the visual RGB data a huge variety of features is computed 
consisting of co-occurrence features, histogram moments, Haar wavelet ?lters, 
anisotropic Gaussian ?lters, and ?rst and second order spatial derivatives for 
various mask widths and orientation angles (VIS features). 
The NIR-scores and VIS-features are then combined in a feature vector of 
dimension d: x 
?: 
Rd for each pixel of a track. The set of feature vectors 
X 
= 
{xi}, i 
?f {1,...,N} 
along with a class label from labeled data form the training 
data set we operate on. Thus, NIR- and VIS-features are fused in these vectors 
and treated in a common sense by the classi?er and feature ranking procedure. 
We discriminate 10 paper classes which were de?ned by a third party project 
partner. The conveyor belt is treated as a separate background class. Thus, a 
total number of 11 classes are discriminated for the results in this paper (see 
Table 1). 
2.2 Plastic Data 
To test the recognition of plastic waste only one bottle per plastic class was 
available. The bottles were cleaned, and labels or markers were removed. This is 
only a small data set, and the preparation had to lead to too optimistic results 
in terms of recognition rates, but we wanted to check two aspects: 
– Does our generic approach have a chance to be successfully transferred to the 
treatment of plastic waste? 
– Can the feature selection analysis be successfully applied to raw NIR-spectra 
as well to overcome the need of experts experience to compute application 
dependent score values? 
For plastic objects, the NIR-camera recorded 320 tracks perpendicular to the 
belt movement in the range of 900-1200 ?m and a wavelength resolution of
34 A. Serebryanyk et al. 
Table 2. Plastic classes to be discriminated 
Class index Abbreviation Description 
0 BG Background 
1 PET raw Polyethylene Terephthalate raw material 
2 PET bottles PET bottles 
3 PET blue PET blue 
4 PET brown PET brown 
5 PET green PET green 
6 PET transp PET transparent 
7 ABS Acrylnitril-Butadien-Styrol 
8 PE Polyethylene 
9 PE UHMW PE ultra high-molecular 
10 PEUHMWTG 1.2 PE ultra high-molecular TG 1.2 
11 PE hard Polyethylene hard 
12 Polyester resin Polyester resin 
13 PA Polyamide 
14 PC Polycarbonate 
15 PP Polypropylene 
16 PVC hard Polyvilylchloride hard 
17 PAK Polyacrylate 
256 values. The background was suppressed by an intensity threshold. For the 
training of the background as a separate class some additional measurements 
were taken from an empty belt. The background data were reduced as in the 
paper data experiments so that the background does not dominate the other 
classes and hence the determined recognition rate. Based on these data a labeled 
training set was built up. 
Note that some PET-classes only di?er in color. Table 2 lists all de?ned plastic 
classes. 
3 Related Work 
NIR spectroscopy is a well established technique for material identi?cation in 
general and paper sorting in particular [9–11]. Besides characteristic absorption 
bands, also ?rst and second order derivatives are used to preprocess the raw 
re?ectance spectra. Smoothing ?lters like Savitzky-Golay are used to reduce 
noise in the derivatives [9]. Furthermore, Principal Component Analysis (PCA) 
is used to reduce the dimension of the feature space [7]. Classi?cation is then 
carried out by evaluating several subsequent binary decision rules, for which 
Partial Least Squares (PLS) regression is applied. The order of these substeps 
is based on a sequence of manual analysis steps or on rather intuitive decisions.
Generic paper and plastic recognition and redundancy-aware feature ranking 35 
Along with PCA also other techniques for feature analysis like Fisher Linear 
Discriminant Analysis (LDA) or the divergence measure based on Kullback-
Leibler distance for probability distributions, besides others, have been used for 
similar problems in pattern recognition [3]. Generally, the linear techniques PCA 
and LDA will be only optimal if the class distributions are well separated and 
Gaussian in feature space. 
Well known classi?ers include Classi?cation and Regression Trees (CART) 
[2], Randomized Trees or Random Forests [1] and Support Vector Machines 
(SVM), besides many others [3]. Feature ranking can be done, e.g. by using a 
CART with surrogates [2], Randomized Trees [5], or Recursive Feature Elimina-tion 
(RFE) using weight parameters of trained SVMs [6]. 
We decided to use a CART classi?er, since it is a rule-based and parameter 
free technique which can handle a large number of features and performs well on 
arbitrary distributions, provided a large number of training samples is available, 
which is clearly the case in our application [2]. 
In [8], the approach of a generic data fusion of VIS and NIR data using a 
classi?er and a Machine-Learning approach was ?rst described. In the following 
sections, we describe the progress of this work and the ?rst step towards an 
application of the methods to the task of plastic waste sorting by analyzing 
whole raw NIR spectra. 
4 Methodology 
4.1 Classifier 
We use our own C++ implementation of the CART algorithm which is based 
on the principles presented in [2]. The CART algorithm trains a binary decision 
tree. In each node the pattern set is split at a threshold for a feature which 
minimizes the impurity in the following subsets. As impurity metric we use the 
Gini diversity index for a node t as proposed by [2]: 
i(t) = 
.)
j=k 
p(j|t)p(k|t), (1) 
where the indices j and k represent di?erent classes. A splitter s is de?ned by the 
feature which is used to split and the corresponding threshold. The decrease of 
impurity from one node to the left and right child nodes tL and tR by a splitter 
s is described by the delta impurity 
?i(s, t) = i(t) 
-) 
pRi(tR) 
-R 
pLi(tL), (2) 
where pL and pR are the proportions of data in tL and tR respectively. The 
splitter s which maximizes ?i(s, t) is then used as primary splitter. Each leaf of 
the tree ?nally represents a class. To use a trained classi?cation tree, the tree is 
traversed for a given pattern according to the splits in each node and the class 
of the reached leaf node is returned.
36 A. Serebryanyk et al. 
4.2 Feature Ranking and Selection 
In order to rate the importance of features, surrogates are chosen in each node 
of the tree. Therefore, splitting thresholds for the other features not used in 
the primary splitter are sought so that the resulting child trees would be most 
similar to the trees created by the original primary splitter. For each surrogate 
s* 
and the primary splitter s, the delta impurity measure from (2) is calculated. 
Finally these delta impurities are summed up over all nodes for each feature, 
which gives a measure M(xm) for the importance of each feature xm: 
M(xm) = 
.m
t?T 
(?i(s* 
m,t) + ?i(sm,t)) , (3) 
where m 
?) {1,...,d} 
denotes the index of the speci?c feature, T is the set of 
all nodes representing the decision tree and 
s* 
m 
and sm denote the surrogates 
and the primary splitter which involve feature xm. As opposed to the importance 
measure found in [2], which ignores the delta impurity for the primary splitter, we 
deliberately included it, since we think the feature actually used in the primary 
splitter is important by de?nition. Tests with an arti?cially designed test dataset 
also yielded more realistic importance measures when the primary splitter was 
included. 
Moreover, we de?ned an importance measure 
M (xm) which only sums up the 
delta impurities of the primary splitter of each node, thus leaving out these of the 
surrogate splitters. This means that only features actually used by the classi?er 
gain importance. This has the e?ect, that the importance ranking selects between 
similar important but redundant features, thus dropping unnecessary features, 
as we observed in the selection of characteristic wavelengths in raw NIR-spectra 
of plastic waste (see later in Sect. 5.2). 
To validate this observation we created an arti?cial dataset comprising 1000 
samples of 11 overlapping Gaussian distributions with identity covariance matri-ces 
each, that is they scatter isotropically. One distribution is centered at the 
origin, and the others are placed at the coordinate axes at increasing distances 
from the origin. These distributions overlap mostly with the distribution around 
the origin and not with each other. A sketch is given in Fig. 2 for d = 2 features. 
A CART classi?er can easily separate the centered distribution around the 
origin from an apart distribution by one threshold on the corresponding coordi-nate 
axis, that means the corresponding feature. The farther apart a distribution 
is the less is the overlap and thus the more important is that feature. When 
applying the CART, the measure M(xm) leads to an increasing feature ranking 
of features 1, 2,... , 10, as expected. 
In a next step, we replicated the feature 5 in the data set as feature 11. 
Thus, these two features are completely redundant. As expected, these features 
are assigned the same importance by M(xm), as shown in Table 3. By the way, 
an Randomized-Tree classi?er leads to the same ranking result.
Generic paper and plastic recognition and redundancy-aware feature ranking 37 
Fig. 2. A sketch of two isotropic Gaussian distributions overlapping at a di?erent 
degree with the distribution centered at the origin. The circles represent the contour 
lines of the distributions. Feature x2 can better separate class 1 and 3 by a threshold 
than x1 can with class 1 and 2, thus feature x2 is regarded more important than x1 by 
the ranking measure. 
Table 3. Normalized feature rank-ing 
by M(xm) with two redundant 
features 5 and 11 ranked equally 
Feature Importance 
10 1 
9 0.90932413 
8 0.86688438 
7 0.76397420 
6 0.66307053 
5 0.65805597 
11 0.65805597 
4 0.47340730 
3 0.18303054 
2 0.11822442 
1 0 
Table 4. Normalized feature ranking by 
M
.y
(xm) with two redundant features 5 
and 11. Note that feature 11 is ranked 0 
in this case. 
Feature Importance 
10 1 
9 0.90932413 
8 0.86688438 
7 0.76397420 
6 0.66307053 
5 0.65805597 
4 0.47340730 
3 0.18303054 
2 0.11822442 
1 0 
11 0 
In contrast, when using the measure 
M (xm) the classi?er decides to use 
feature 5 and rates the completely redundant feature 11 worthless, as shown in 
Table 4. This is the sort of feature ranking we need to strongly reduce the feature 
count while retaining most information about material classes.
38 A. Serebryanyk et al. 
4.3 Robustness Improvement 
If the classi?er is trained until each leaf contains one single training pattern the 
classi?er will likely be over?tted, since also outliers are ‘learned by heart’ and 
might be confused with representative data from other classes. This problem 
is addressed by an internal cross-validation scheme that prunes back the fully 
trained tree to some degree until it generalizes well on the given dataset. 
However, in a real-world scenario with changing side conditions, feature mea-surements 
might be slightly in?uenced by additional e?ects not covered by the 
original training dataset. We address this problem by continuing the pruning 
process of the trained tree to make it more robust against small changing mea-surement 
e?ects. By the way, this leads to simpler trees as well. 
4.4 Data Preprocessing 
Paper Data. The training data is compiled from mono-fraction recordings for 
each class. As a preprocessing step the paper objects were separated from the 
background by using a threshold on the intensity of the visual data. 
For the results in this paper, the visual resolution of 1204 pixels per scan was 
scaled down to the resolution of 172 pixels of the NIR data, by a simple data 
reduction. 
Since the background class of the conveyor belt showed to be quite dominant 
and very well distinguishable from the paper classes, the background data was 
resampled to roughly the same amount as the next bigger classes. This avoids the 
overall recognition rate to be too optimistic just because of a good background 
recognition. 
Plastic Data. According to [12], varying intensities from scan line to scan line 
were caused by varying distances between camera and the objects and by di?use 
scattering e?ects. Following the norming procedure described in [12], all spectra 
are normed so that 
.o
d 
i=1 
|xi| 
= const = 256 , 
where xi is a component of the feature vector x 
?h 
Rd , in this case the intensity 
value at a particular wavelength of the spectrum at a pixel of the scan track. 
Essentially, this normalization removes a constant bias. The constant value 256 is 
chosen to avoid inaccuracies due to ?oating point errors for big or small spectral 
values. Imposed PP-spectra, normalized and smoothed, are shown in Fig. 3 as 
an example. These spectra match quite well, they don’t spread much vertically. 
Since the spectra don’t show sharp peaks, no peak retaining smoothing ?l-ter 
is necessary. We used simple Gaussian smoothing ?lters, and calculated the 
?rst and second derivatives by derivated Gaussian ?lters as additional spectral 
features used in the material classi?cation.
Generic paper and plastic recognition and redundancy-aware feature ranking 39 
Fig. 3. Example of superimposed spectra for plastic sort Polypropylene (PP) after 
normalization and smoothing to show the variation in the spectra. The spectra don’t 
spread much vertically after normalization (The color scale represent frequency of over-lapping 
spectra and can be ignored here). 
5 Experimental Results 
5.1 Paper Data 
The dataset used for the following results consisted of almost 4 million sam-ples 
of which 80% were used as training set and 20% as validation set in a 3-fold 
cross-validation scheme. To be clear, the purpose of this cross-validation is to get 
a most accurate estimation of the real recognition rate. We emphasize that this 
dataset originates from a real sorting facility with all dirty e?ects like probe 
contamination, light scattering, changing detector-probe distances, shadow 
e?ects, etc. 
Solely using the given NIR features as described in Sect. 2.1, our classi?er 
achieved an overall recognition rate of 63%. The classi?cation statistics are given 
in Table 5, and the corresponding error matrix or confusion matrix F is visualized 
in Fig. 4. Ni/N is the fraction of data belonging to class i. The elements Fij of F 
are the number of samples from class i which are classi?ed as class j, where i is 
the row index and j the column index. The diagonal elements of F represent the 
frequency of correct classi?cation decisions, while the o?-diagonals show false-positive 
and false-negative decision rates. From F the diagonal elements diag(F ) 
are extracted and the F1 measure is computed. The F1 measure is the harmonic 
mean of precision and recall and thus also considers false positives and false 
negatives. The overall recognition rate is calculated as 1 
-s 
P (F ), where P (F ) is 
the error probability. 
Adding the RGB and HSV channels the recognition rate could be raised to 
69%. In a ?rst attempt to include other features, a variety of 386 additional 
visual features were computed consisting of co-occurrence features, histogram 
moments, Haar wavelet ?lters, anisotropic Gaussian ?lters, and ?rst and second 
order spatial derivatives for various mask widths and orientation angles. The 
total of 419 features resulted in a recognition rate of around 77%. 
As a remark, the trained CART classi?er consists of 484054 decision nodes 
and 33371 leaves in this case. Two reasons led us to the decision not to use a 
Randomized Tree (RT) instead of a CART: ?rst a RT ranks the features like a 
CART with surrogate rules according to M(xm). Second, the time of a couple of
40 A. Serebryanyk et al. 
minutes needed to read in a trained RT consisting of e.g. 100 CART classi?ers 
is a bit prohibitive in a real facility environment. 
Table 5. Classi?cation statistics for all NIR features (d = 29) 
Class index i 0 1 2 3 4 5 6 7 8 9 10 
Class abbrev. BG ZD MGWD BP WPb WPw-u WP-g KA-u KA-g SV UN 
Ni/N 16.65 11.87 21.44 13.56 4.93 5.46 2.98 2.26 13.52 3.83 3.49 
F1 measure 95.09 54.68 60.35 65.75 43.68 36.32 36.03 19.23 68.98 30.82 34.39 
diag(F ) 16.169 7.120 14.346 9.618 2.284 1.702 0.736 0.276 9.060 0.789 0.858 
1 
-1
P (F ) = 62.958 
Table 6. Classi?cation statistics for the best d = 59 features selected among NIR, 
RGB, HSV and a mixture of visual features 
Class index i 0 1 2 3 4 5 6 7 8 9 10 
Class abbrev. BG ZD MGWD BP WPb WPw-u WP-g KA-u KA-g SV UN 
Ni/N 16.65 11.87 21.44 13.56 4.93 5.46 2.98 2.26 13.52 3.83 3.49 
F1 measure 96.49 72.60 75.19 80.84 82.79 70.18 63.42 69.81 75.57 62.53 61.99 
diag(F ) 16.026 8.704 17.086 11.074 4.079 3.629 1.641 1.457 10.242 2.172 1.973 
1 
-1
P (F ) = 78.082 
By iteratively deleting the most unimportant features (according to the mea-sure 
described in Sect. 4.2), the number of features could be reduced to just 59, 
while even improving the recognition rate slightly to 78%. The error statistics 
are listed in Table 6, and the corresponding error matrix F is visualized in Fig. 5. 
It is worth to be noted, that the increase in recognition rate from 63% to 
78% contributed mainly to the paper classes and not to the background class 
(compare F1 measures in Tables 5 and 6). An example of classi?ed paper waste 
is shown at the bottom of Fig. 1 where the paper classes are labeled by di?erent 
colors. 
To further illustrate the feature selection process and its relevance to the 
achievable recognition rate, Fig. 6 shows the recognition rate versus the number 
of selected features among the 419 total features. At the far right, when all NIR 
and VIS features are used, 77% recognition rate is achieved. Surprisingly, when 
moving to the left in this plot, a further deletion of features results in a slight 
increase of the recognition rate, because the classi?er is no longer worried about 
useless and redundant information in the data set. However, the CART classi?er 
is a parameter free approach and deals robustly with useless information. The 
most important result is, however, that the features can be reduced down to 59 
with no loss in the recognition rate, which leads to 78%. Only when reducing
Generic paper and plastic recognition and redundancy-aware feature ranking 41 
Fig. 4. Visualization of the class error 
matrix F for 29 NIR features. With i being 
the row index and j the column index, the 
elements Fij are the number of samples 
from class i which are classi?ed as class j. 
Low values are colored in blue, high values 
in red. 
Fig. 5. Visualization of the class error 
matrix F for best 59 NIR+VIS features 
(see peak in Fig. 6). The recognition rate 
is improved much compared to Fig. 4. 
the features further, a signi?cant decrease of the recognition rate results (see far 
left in Fig. 6). Thus, with appropriate feature selection, the computational cost 
can be reduced, since only the best visual features need to be computed. 
Interestingly, our feature ranking also showed, that the H and S channel of 
the HSV data are quite important, which is also stated by [9]. More surprisingly, 
almost half of the original NIR features could be dropped in the remaining set 
of 59 features – even the values for talcum and lignin. 
While [10] states, that rule-based classi?ers like CART are generally too slow 
for real-time applications, we would be able to process at a conveyor speed of 
4m/s on a standard 4-core computer based on 29 NIR, 3 RGB and 3 HSV features 
without the need to further parallelize by hardware. This would be eight times 
the actual conveyor speed. When, however, exploiting many hundreds of visual 
features, more sophisticated data preprocessing steps need to be applied. 
5.2 Plastic Data 
In the ?rst experiment, a CART-classi?er was trained for all 17 classes with 768 
features. The size of the training data is big enough, and the classi?er uses an 
internal cross-validation so that over?tting is avoided. The class error matrix in 
Fig. 7 however shows an almost perfect recognition of all classes with 1-P (F ) = 
89.57%. Even the ?ve PET-classes, that only di?er in color and cause the most 
recognition errors, are recognized quite well. This is an overly optimistic result, 
of course, but it shows it’s worth to proceed with our generic approach.
42 A. Serebryanyk et al. 
In the next experiment, only the most important classes from an application 
point of view are considered further by merging all PET-classes (1–6) and all 
PE-classes (8–11) to one PET and PE class respectively, and dropping classes 
7, 12, and 17, see Table 7 and compare with Table 2. 
Table 7. Most important plastic classes to be discriminated, with N = 537267 samples 
in total. The class index runs from 0,...,c with c = 6 classes plus background 
Class index i Abbreviation Class Pattern samples Ni 
0 BG Background 192678 
1 PET Polyethylene Terephthalate 192676 
2 PE Polyethylene 105113 
3 PA Polyamide 12078 
4 PC Polycarbonate 2641 
5 PP Polypropylene 15059 
6 PVC hard Polyvilylchloride hard 17022 
Fig. 6. Recognition rate over selected features. Best trade-o? with 59 features and 
recognition rate of 78%. 
Table 8. Classifying statistics for 6 important classes with d = 768 features 
Class index i 0 1 2 3 4 5 6 
Class abbrev. BG PET PE PA PC PP PVC 
Ni/N 35.86 35.86 19.56 2.25 0.49 2.80 3.17 
F1 measure 99.79 99.62 99.74 99.94 94.48 99.47 97.15 
diag(F ) 35.809 35.734 19.507 2.247 0.454 2.793 3.061 
1 
-1
P (F ) = 99.604%
Generic paper and plastic recognition and redundancy-aware feature ranking 43 
Fig. 7. Class error matrix for all plastic classes. 
The overall recognition rate is 1 
-s
P (F ) = 
89.57%. Mostly the di?erently colored PET-classes 
contribute to the recognition error. 
Fig. 8. Class error matrix for 6 
important plastic classes. The over-all 
recognition rate is 1 
-s
P (F ) = 
99.604%. 
Fig. 9. Recognition rate versus selected features count for Breiman-measure (blue) and 
only primary splitter measure (red). Less features are needed in the red case. 
Figure 8 shows the related class error matrix, and Table 8 the classi?cation 
statistics. As before the recognition rate is very good, almost 100% now. 
The e?ect, only to consider the primary splitter in the feature ranking is 
shown in Fig. 9. The recognition rate drops at less features compared to the 
feature selection based on the original ranking criterion. That’s because now 
the ranking selects between equally important, but redundant features, thus 
dropping high ranked but unnecessary features as well. 
Figure 10 shows the second derivative of spectra of various plastic materials. 
The grey bars indicate the importance assigned to wavelengths according to this 
feature by the importance measure 
M (xm). Wavelengths where this feature 
shows great diversity are rated high. 
As mentioned above, these recognition rates are overly optimistic due to (a) 
the careful probe preparation and (b) the data set being far from realistic for 
all possible appearances of plastic waste in a real facility. But the results show,
44 A. Serebryanyk et al. 
Fig. 10. Importance (grey bars) of the 2nd derivative of spectra versus wavelength. 
that even identic PET probes, only di?erently colored, can be recognized well, 
and that the feature selection scheme can be applied to whole raw NIR-spectra 
too. This is all the more important as 
– It is a generic approach without the need of any expert knowledge, and 
– The amount of data of a raw spectrum is about eight times that of prepro-cessed 
score values, hence the need for a data reduction increases much. 
6 Conclusion and Outlook 
The experimental results including additional visual features show a signi?cant 
improvement over NIR scores alone. Our results on the real world paper data 
approve the preliminary results attained on a laboratory-dataset with 14 di?erent 
paper classes. The feature ranking of the CART classi?er enables us to use many 
potential features at ?rst and automatically select only the best subset for a 
productive environment. 
The application of the material recognition methods on raw NIR-spectra of 
plastic waste reveals that wavelengths can be selected in an generic way, where 
material classes exhibit characteristic diversity, thus preprocessed scores depen-dent 
on the experience of a particular camera manufacturer are no longer nec-essary. 
This way, the amount of data of raw spectra can be successfully reduced 
as well while retaining the crucial information. 
For the future, we plan to exploit the full visual resolution in order to capture 
?ner structure details in paper waste. At the same time, intelligent data fusion of 
multivariate data of di?erent resolutions is needed to avoid resubstitution error 
due to partially replicated data. With a sevenfold higher resolution, the com-putational 
costs will also be a critical factor. Therefore, we want to investigate 
the applicability of a regional pre-clustering procedure and other data reduction 
techniques. We also intend to compare the feature ranking technique used in 
our CART classi?er to other possible techniques, like e.g. l1-regularized data 
reduction. Compared to a simple RGB camera a NIR sensor is rather expen-sive. 
Thus, it is also of interest, if visual features alone su?ce to achieve an at 
least acceptable recognition rate for a lower price. Since real world paper waste 
is not guaranteed to only contain paper, detection of problematic material like
Generic paper and plastic recognition and redundancy-aware feature ranking 45 
in?ammable materials or rigid objects which might damage the sorting plant 
would be much appreciated. For these classes it is generally hard to gather much 
training data, as the variety of possible objects is huge. 
The recognition results for plastics on a small data set of raw NIR-spectra 
are quite promising and advice us to determine the recognition rates on a large 
scale in a real sorting facility for plastic materials as well. 
References 
1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). ISSN: 0885-6125 
2. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classi?cation and Regression 
Trees. Chapman & Hall/CRC, Boca Raton (1984). ISBN: 978-0-412-04841-8 
3. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classi?cation, 2nd edn. Wiley, New 
York (2000). ISBN 0-471-05669-3 
4. Verband Deutscher Papierfabriken e.V. Facts about Paper (2015). Brochure. 
Accessed 30 Nove 2015. http://www.vdp-online.de/en/papierindustrie/statistik 
5. Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random 
forests. Pattern Recognit. Lett. 31(14), 2225–2236 (2010) 
6. Guyon, I., Weston, J., Barnhill, S., Vapnik. V.: Gene selection for cancer classi?ca-tion 
using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002). ISSN: 
0885-6125 
7. Jolli?e, I.T.: Principal component analysis. In: Springer Series in Statistics. 
Springer, New York (1986). ISBN: 0-387-96269-7 
8. Klippel, P., Zisler, M., Schr¨oder, F., Schleich, S., Serebryanyk, A., Schn¨orr, C.: 
Improvement of dry paper waste sorting through data fusion of visual and NIR 
data. In: Pretz, T., Wotruba, H. (eds.) 7th Sensor-Based Sorting & Control 2016, 
Shaker (2016) 
9. Leitner, R., Rosskopf, S.: Identi?cation of ?exographic-printed newspapers with 
NIR spectral imaging. Int. J. Comput. Inf. Syst. Control. Eng. 2(8), 68–73 (2008). 
ISSN: 1307-6892 
10. Rahman, M.O., Hussain, A., Basri, H.: A critical review on waste paper sorting 
techniques. Int. J. Environ. Sci. Technol. 11(2), 551–564 (2014). ISSN: 1735–1472. 
English 
11. Rahman, M.O., Hussain, A., Scavino, E., Basri, N.E.A., Basri, H., Hannan, M.A.: 
Waste paper grade identi?cation system using window features. J. Comput. Inf. 
Syst. 6(7), 2077–2091 (2010). ISSN: 1553-9105 
12. Siesler, H.W., Ozaki, S., Kawata, Y.-a., Heise, H.M.: Near-Infrared Spectroscopy. 
Principles, Instruments, Applications. Wiley-VCH Verlag GmbH (2002)
Hand Gesture Recognition 
with Leap Motion 
Lin Feng1 , Youchen Du1 , Shenglan Liu1(B) , Li Xu2 , Jie Wu1 , and Hong Qiao3 
1 
Dalian University of Technology, Dalian, China 
liusl@mail.dlut.edu.cn 
2 
Neusoft Co. Ltd., Shenyang, China 
3 
Chinese Academy of Sciences, Beijing, China 
Abstract. Hand gesture is a natural way for people to communicate, it 
plays an important role in Human-Computer Interaction (HCI). Nowa-days, 
many developers build HCI applications on the top of hand gesture 
recognition, but how to get more accurate when recognizing hand ges-tures 
still have a long way to go. The recent introduction of depth cam-eras 
like Leap Motion Controller (LMC) allows researchers to exploit the 
depth information to recognize hand gesture more robustly. This paper 
proposes a novel hand gesture recognition system with LMC for hand 
gesture recognition. Histogram of Oriented Gradient (HOG) feature is 
extracted from Leap Motion binarized and undistorted sensor images. 
We feed these features into a multi-class Support Vector Machine (SVM) 
classi?er to recognize performed gesture. The results show that our model 
is much more accurate than previous work. 
Keywords: Hand gesture recognition 
Support Vector Machine (SVM) 
Histogram of Oriented Gradient (HOG) 
·
Leap motion 
1 Introduction 
In recent years, with the enormous development in the ?eld of machine learn-ing, 
problems such as understanding human voice, language, movement, posture 
become more and more popular, hand gesture recognition as one of the these 
?elds has attracted many researchers’s interest [1]. Hand is an important part of 
the human body, as a way to supplement the human language, gestures play an 
important role in daily life, in the ?elds of human-computer interaction, robotics, 
sign-language, how to recognize a hand gesture is one of the core issues [2–4]. In 
previous work, Orientation Histograms have been used to recognize hand ges-ture 
[5], a variant of Earth mover’s distance(EMD) also have been used to ?nish 
this task [6]. Recently, a bunch of depth cameras such as Time-of-Flight cameras 
and Microsoft Kinect have been marketed one after another, the use of depth 
features has been added to the gesture recognition based on low dimensional 
feature extraction [7]. A volumetric shape descriptor have been used to achieve 
robust pose recognition in realtime [8], adding features like distance, elevation, 
] 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 46–54, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_4
Hand Gesture Recognition with Leap Motion 47 
curvature based on 3D information on the hand shape and ?nger posture con-tained 
in depth data have also improved accuracy [9]. Recognize hand gesture 
through contour have also been explored [10]. Use ?nger segmentation to recog-nize 
hand gesture have been tested [11]. Use HOG feature and SVM to recognize 
hand gesture have also been proposed [12]. 
The Leap Motion Controller (LMC) is a consumer-oriented tool for gesture 
recognition and ?nger positioning developed by Leap Motion. Unlike Microsoft 
Kinect, it is based on binocular visual depth and provides data on ?ne-grained 
locations such as hands and knuckles. Due to the di?erent design concepts, it can 
only work normally under close conditions, but it has a good performance on data 
accuracy with an accuracy of 0.2 mm [13]. There have been many researches try 
to recognize hand gesture LMC [14,15]. Combining Leap Motion and Kinect for 
hand gesture recognition have been proposed and achieved a good accuracy [16]. 
Our main contributions are as follows: 
1. We propose a LMC hand gesture dataset, which contains 13 subjects and 10 
gestures, each gesture by each subject is repeated 20 times, thus we have 2600 
samples in total. 
2. We use Leap Motion only. We extract the HOG feature of LMC sensor images, 
HOG feature signi?cantly improved gesture accuracy. 
This paper is organized in this way: In Sect. 2, we give a brief introduction 
of our model architecture, methods and our dataset. In Sect. 3, we present the 
HOG feature extracted from binarized LMC sensor images. In Sect. 4, we analyze 
and compare the performance of the HOG feature with the work presented by 
Margin et al. In Sect. 5, we put forward the conclusion of this paper and thoughts 
on the following work. 
2 Overview 
In this section, we describe the model architecture we used and the way data is 
handled (Sect. 2.1), and how we collect our dataset by LMC (Sect. 2.2). 
2.1 System Architecture 
Figure 1 shows in detail the recognition model we designed. For sensor images, 
we retrieve sensor images from LMC and binarize these images, then we extract 
the HOG feature, ?nally put these features into a One-vs-One multi-class SVM 
to classify hand gesture. 
2.2 Hand Gesture Dataset 
In order to evaluate the performance of the HOG feature of the raw sensor 
images, we propose a new dataset, the setup is shown in Fig. 2. The dataset 
contains a total of 10 gestures (Fig. 3) performed by 13 individuals, each gesture 
is repeated 20 times, so the dataset contains a total of 2600 samples. The tracking
48 L. Feng et al. 
Fig. 1. System architecture. 
Fig. 2. Capture setup.
Hand Gesture Recognition with Leap Motion 49 
data and sensor images are captured simultaneously, and each individual is told 
to perform gestures within LMC’s valid visual range, allowing translation and 
rotation, no other prior knowledges. 
Fig. 3. Gestures in dataset. 
3 Feature Extraction from Sensor Images 
3.1 Sensor Images Preprocessing 
Barrel distortion is introduced due to LMC’s hardware (Fig. 4), in order to get 
realistic images we use an o?cial method provided by Leap Motion to use bilinear 
interpolation to correct distorted images. 
We use threshold ?ltering for the corrected image, and after doing so, the 
image will be binarized, retaining the area of the hand and removing the non-hand 
area as much as possible, as show in Fig. 5. 
3.2 Histogram of Oriented Gradient 
The HOG feature is a feature descriptor used for object detection in computer 
vision and image processing. Its essence is the statistics of image gradient infor-mation. 
In this paper, we use HOG feature to extract the feature information 
about gestures in binarized undistorted sensor images.
50 L. Feng et al. 
Fig. 4. Raw images from LMC. 
Table 1. Tracking features accuracy on both datasets 
Marin et al. Ours 
79.80% 82.30%
Hand Gesture Recognition with Leap Motion 51 
Fig. 5. Binarized images. 
4 Experiments and Results 
4.1 Comparison Between Di?erent Datasets 
In order to prove our dataset have a similar data distribution compared with pre-vious 
work and have no special preferences on our HOG feature, We reconstruct 
the calculations for features like ?ngertips angle, ?ngertips distance, ?ngertips 
elevation in [16], the results as shown in Table 1. 
4.2 HOG Feature with Di?erent Classi?ers 
We compare the performance of HOG feature with di?erent classi?ers, such as 
LR, SVM (RBF), SVM (linear), RF, KNN, MLP. In each round, we split dataset 
into 80% train set and 20% test set, then we train these classi?ers with the same
52 L. Feng et al. 
data and validate its performance. The results of 10 rounds show that SVM with 
RBF kernel outperforms other classi?ers with a signi?cantly margin, as shown 
in Table 2. 
Table 2. Performance of HOG feature on di?erent classi?ers 
Classi?er Precision 
LR 88.15% 
SVM(RBF) 96.42% 
SVM(linear) 96.31% 
RF 82.50% 
KNN 94.69% 
MLP 94.00% 
4.3 SVM Details 
We use the One-vs-One strategy for multi-class SVM with RBF kernel to classify 
10 classes, for each class pair there is a SVM, so result in a total of 10*(10-1)/2 = 
45 classi?ers, the ?nal classi?cation result based on votes received. For hyper-parameters 
like (C, ?), we use the grid search method on 80% of the samples 
with 10-fold cross-validation, C is searched from 100 to 103 , ?0 is searched from 
10-4 to 100 . 
We present our best results with parameters searched by grid search in 
Table 3. 
Table 3. Best results with parameters searched by grid search 
Classi?er Precision 
SVM(RBF) 98.27% 
5 Conclusions and Future Works 
In this paper, we proposed a LMC hand gesture dataset, which contains 13 
subjects and 10 gestures. We proposed a way to extract HOG feature from LMC 
raw sensor images by using binarized and undistorted method. We compared 
the performance of HOG feature with di?erent classi?ers and presented the best 
results in our experiment. 
In future work, we will explore the characteristics of tracking data, we think 
the characteristics of the joints will also a?ect the accuracy of the overall clas-si?cation 
due to the correlation between joints. We will try to perform feature
Hand Gesture Recognition with Leap Motion 53 
fusion between tracking features and HOG feature, the results should be con-siderable. 
The current training process consumes much time in our experiment, 
we will continue to optimize the training process by introducing techniques like 
removing linearly-dependent features by PCA. At the same time, we will study 
the interaction between the system and virtual reality application scenarios. 
Acknowledgments. This work was supported in part by the National Natural Sci-ence 
Foundation of China under Grant 61627808, 91648205, 61602082, and 61672130. 
This work was also supported in part by the development of science and technology of 
guangdong province special fund project Grants 2016B090910001 and Open Program 
of State Key Laboratory of Software Architecture (Item number SKLSAOP1701). 
References 
1. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human 
computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015) 
2. Ohn-Bar, E., Trivedi, M.M.: Hand gesture recognition in real time for automotive 
interfaces: a multimodal vision-based approach and evaluations. IEEE Trans. Intell. 
Transp. Syst. 15(6), 2368–2377 (2014) 
3. Wan, C., Yao, A., Van Gool, L.: Hand pose estimation from local surface normals. 
In: European Conference on Computer Vision, pp. 554–569. Springer (2016) 
4. Chaudhary, A., Raheja, J.L., Das, K., Raheja, S.: Intelligent approaches to interact 
with machines using hand gesture recognition in natural way: a survey. arXiv 
preprint arXiv:1303.2292 (2013) 
5. Freeman, W.T., Tanaka, K.-i., Ohta, J., Kyuma, K.: Computer vision for computer 
games. In: Proceedings of the Second International Conference on Automatic Face 
and Gesture Recognition, pp. 100–105. IEEE (1996) 
6. Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition 
using kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013) 
7. Suarez, J., Murphy, R.R.: Hand gesture recognition with depth images: a review. 
In: RO-MAN, 2012 IEEE, pp. 411–417. IEEE (2012) 
8. Suryanarayan, P., Subramanian, A., Mandalapu, D.: Dynamic hand pose recogni-tion 
using depth data. In: 2010 20th International Conference on Pattern Recog-nition 
(ICPR), pp. 3105–3108. IEEE (2010) 
9. Dominio, F., Donadeo, M., Zanuttigh, P.: Combining multiple depth-based descrip-tors 
for hand gesture recognition. Pattern Recognit. Lett. 50, 101–111 (2014) 
10. Yao, Y., Yun, F.: Contour model-based hand-gesture recognition using the kinect 
sensor. IEEE Trans. Circuits Syst. Video Technol. 24(11), 1935–1944 (2014) 
11. Chen, Z.-h., Kim, J.-T., Liang, J., Zhang, J., Yuan, Y.-B.: Real-time hand gesture 
recognition using ?nger segmentation. Sci. World J. 2014 (2014) 
12. Feng, K.-p., Yuan, F.: Static hand gesture recognition based on hog characters and 
support vector machines. In: 2013 2nd International Symposium on Instrumenta-tion 
and Measurement, Sensor Network and Automation (IMSNA), pp. 936–938. 
IEEE (2013) 
13. Weichert, F., Bachmann, D., Rudak, B., Fisseler, D.: Analysis of the accuracy and 
robustness of the leap motion controller. Sensors 13(5), 6380–6393 (2013) 
14. Ameur, S., Khalifa, A.B., Bouhlel, M.S.: A comprehensive leap motion database 
for hand gesture recognition. In: 2017 International Conference on Information and 
Digital Technologies (IDT), pp. 514–519. IEEE (2017)
54 L. Feng et al. 
15. Wei, L., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion 
controller. IEEE Signal Process. Lett. 23(9), 1188–1192 (2016) 
16. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with leap motion 
and kinect devices. In: 2014 IEEE International Conference on Image Processing 
(ICIP), pp. 1565–1569. IEEE (2014)
A Fast and Simple Sample-Based T-Shirt Image 
Search Engine 
Liliang Chan(?) , Pai Peng, Xiangyu Liu, Xixi Cao, and Houwei Cao 
Department of Computer Science, New York Institute of Technology, New York, USA 
{lchen25,ppeng,xliu24,xcao01,hcao02}@nyit.edu 
Abstract. In this paper, we proposed a fast and simple sample-based T-shirt 
image retrieval system TColor, which can e?ectively search T-shirt image by 
main color, and optional secondary colors. We considered several distinct prop- 
erties of T-shirt images. Instead of traversing all pixels on T-shirt image, we 
search T-shirt by color based on 12 representative pixels extracted from the esti- 
mated e?ective T-shirt area. We evaluated our system based on a small amount 
of pilot T-shirt image data. Our results indicated that the proposed system signif- 
icantly outperforms the straight-forward, brute force un?ltered traverse search, 
and obtains similar results with a much complex, time-consuming ?ltered traverse 
algorithm which removes the background color for t-shirt image during the 
search. 
Keywords: T-shirt image · Image search · Search engine 
1 Introduction 
In the era of information age, there are dramatically number of images being distributed 
and shared over the web. As a result, many search engines have added the function of 
image search, such as Google, Baidu, Bing, etc. The most common approach for image 
search is “content-based” image retrieval, which is based on the image analysis in order 
to extract low-level visual properties, such as color, shape, and texture [1, 2]. Besides, 
other systems search images based on the visual similarity, regardless of the content of 
the real images [3]. The ?rst step in image retrieval is feature extraction. Most image 
search engines use the color space feature extractor and the composition space feature 
extractor to extract the image features, and then search the best image based on the 
similarities. During the search process, the perceptual hash algorithm is usually used to 
generate a “?ngerprint” string for each picture, and the similarity between images can 
be measured by comparing the ?ngerprints between di?erent pictures. Although image 
search has been successfully applied in many search engines and applications, it is not 
trivial and there are many challenges encountered in the search process. For example, 
simplifying the color and calculating the gray-scale average of pixels can take very long 
time on large image databases. In addition, compared with general image search, T-shirt 
search has some distinctive characteristics and challenges. In this paper, we proposed a 
fast and simple sample-based T-shirt image search engine. By considering several 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 55–62, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_5
distinct properties of T-shirt image, our system can e?ectively search T-shirt by main 
color, and optional secondary colors. 
In this paper, we proposed a very fast and simple sample-based T-shirt image search 
engine, which can e?ectively search T-shirt by color. Compared with general image 
search, T-shirt search has some distinctive characteristics and challenges. For example, 
the t-shirt images usually have large portion of background, and the background color 
can cause perturbations to search accuracy. On the other hand, the t-shirt images usually 
have symmetrical structure, and located in the relatively ?xed position of the entire 
image. By considering these distinct properties of T-shirt images, we proposed a simple 
but e?ective search system which can search the T-shirt images by main color and 
optional secondary colors. For each t-shirt image, instead of traversing all pixels, we 
?rst select 12 pixels based on some sampling rules derived from analyzing a small 
amount of pilot data, and extract the RGB data of these pixels [5]. Then we transform 
three-dimensional microscopic RGB data into visual colors. In the process, we chose 
12 common colors, and classi?ed pixels with di?erent RGB into colors based on the 
Euclidean distance [6]. Meanwhile, we compute the proportion of each color and stored 
the information into our t-shirt image database for future search. Based on the pilot 
evaluation results on 200 t-shirt images, our proposed system signi?cantly outperforms 
the general un?ltered traverse search, and obtains similar results with much complex, 
time-consuming ?ltered traverse algorithms which removed the background color for t-shirt 
image. 
2 Methods 
In this section, we introduce how we implement the proposed sample-based t-shirt search 
algorithm. 
2.1 Selection of Representative Pixels 
Instead of traversing all pixels, our proposed sample-based t-shirt search system search 
t-shirt is only based on a few samples. How to select representative sampling points is 
very crucial for the search accuracy. Here we introduce our strategies for data sampling. 
First of all, as most of the t-shirts are symmetrical, we only focus on left half of the 
image. Chopping half of the image can obviously decrease the search time and 
complexity, reduce the data size from 2n to n. On the other hand, t-shirt images usually 
have large portion of background. We try to avoid the background area and only select 
data samples from the e?ective t-shirt region. In order to do that, we try to determine 
the relative position of T-shirt boundary in four directions (left, right, upper, and lower) 
based on statistical analysis on 50 pilot t-shirt images in our dataset. Figure 1(a) and 
1(b) shows the histogram of the boundary locations based on the 50 pilot images, clearly 
indicates the range of boundary locations. Based on that, we can roughly determine the 
valid area of t-shirt images as shown in Fig. 2. Then we randomly sample 12 pixels from 
the valid area, and an example of t-shirt image and how the 12 selected pixels distributed 
can be found in Fig. 2 as well. 
56 L. Chan et al.
Fig. 1a. Histogram for left/right boundary distribution. 
Fig. 1b. Histogram for upper/lower boundary distribution. 
Fig. 2. Valid search area (left) & example of how 12 selected pixels distributed. 
2.2 Determining the Color for Selected Pixels 
For each selected pixel, we can easily get the corresponding microscopic R-G-B data 
on these sampling pixels by Python Image Library PIL [4]. However, the microscopic 
A Fast and Simple Sample-Based T-Shirt Image Search Engine 57
R-G-B information is not visual enough. We need to transform the microscopic R-G-B 
to macroscopical candidate colors [7–9]. In our proposed t-shirt search system, we give 
users 12 candidate colors to choose, including black, white, red, orange, yellow, green, 
cyan, blue, purple, pink, grey and brown. Therefore, we divide the R-G-B 3-dimensional 
space into 12 parts based on Euclidean Distance [10]. By computing the Euclidean 
Distance between the sampling pixel and the standard colors as (1), the sample pixel 
should belong to color category Ci with the shortest distance. 
D (sample 
pixel, 
standard color) = 
vh 
(R -h R')
2 
+ (G -' G')
2 
+ (B -' B')
2 
, (1) 
D 
( 
P, Ci 
) 
= Min(D (P, 
C)) 
(2) 
2.3 Traversal-Based T-shirt Retrieval 
Two traversal based search algorithms are implemented as well for the sake of compar- 
ison. 
Un?ltered Traversal Search. We ?rst consider the most straight-forward search 
approach, the un?ltered traversal search. This simple brute-force approach does not take 
into account the background color of t-shirt image. We try to traverse very pixel on the 
image, and get the corresponding R-G-B data for each pixel, then further classify them 
into one of the twelve candidate colors [11]. 
Filtered Traversal Search. In ?ltered-traversal search, we try to ?lter the background 
color of the t-shirt image. As the Euclidean distance between two pixels of obviously 
di?erent colors should be much bigger than that between two pixels of similar colors, 
we can identify whether a pixel is located on boundary or not, by examining the Eucli- 
dean distance between the current pixel and the adjacent pixel during the traverse search. 
Figure 3 shows the Euclidean distance across the boundary for the 50 pilot t-shirt images 
Fig. 3. Euclidean Distance across the boundary on 50 pilot T-shirt images. 
58 L. Chan et al.
in our dataset. We can see that the min distance across the boundary is 1500. As a result, 
we choose this value as the threshold to ?lter the background color in the ?ltered traversal 
algorithm. 
3 Results 
3.1 Dataset 
3000 T-shirt images were collected for our study. In our pilot study. 200 testing T-shirt 
images were labelled by human labelers. Speci?cally, the labelers will label the T-shirt 
image with one of the 12 common colors including black, white, red, orange, yellow, 
green, cyan, blue, purple, pink, grey and brown. The main color of the T-shirt will be 
marked as the color occupying more than 45 per cent of the T-shirt. Secondary color is 
the one occupying less than the main color but more than 0% of the t-shirt area. 
Figure 4 shows the color distribution of the main color and secondary colors on the pilot 
test data set. 
0 
20 
40 
60 
80 
Color Distribution for 200 Testing Images 
Main Color Secondary Color 
Fig. 4. Color distribution of the main color and secondary colors for the 200 pilot test T-shirt 
images. 
3.2 Evaluations 
Table 1 compares the performance of the three di?erent search approaches. We consider 
two di?erent evaluation matrices, MAP (Mean Average Precision) and MRR (Mean 
Reciprocal Rank). MAP is used to evaluate the system precision in general. Di?erent 
from standard image search engine, in a t-shirt search engine, the accuracy for searching 
by main color should be much more signi?cant. So, we compute the MRR to evaluate 
the main color search as well. 
A Fast and Simple Sample-Based T-Shirt Image Search Engine 59
Table 1. Performance of the three di?erent search approaches 
Algorithm applied to the engine MAP MRR for main color search 
Sampling Algorithm 0.61 0.90 
Traverse Algorithm (Filtered) 0.63 0.90 
Traverse Algorithm (Un?ltered) 0.52 0.78 
From Table 1, we can see that the MAP for sample-based search is 0.61, which is 
comparable with the 0.63 obtained by the ?ltered traversal search, and signi?cantly 
better than the simple, brute-forced traversal search with 0.52 mean averaged precision. 
Similar results can be seen on MRR. The MRR for main color search is 0.90 for the two 
search algorithms which bene?t from removing the background colors during search. 
The MRR performance is much lower on the simple traversal search. Signi?cant test is 
also performed to indicate the signi?cance of the results. The improvement between the 
proposed sample-based search and simple traversal search is statistically signi?cant with 
the p-value 0.02. There is not signi?cant di?erence (p-value 0.15) with the sample-based 
search and ?ltered traversal search. 
We also evaluated the three di?erent systems by testing the execution speed in the 
same testing environment. The results are shown in Table 2. It’s clear that the search 
engine applied the proposed sampling-based algorithm has a clear advantage in execu- 
tion e?ciency. It’s execution speed is less than 1/50 compared with the other two 
engines. The ?ltered traversal search takes the longest time to search the T-shirt image 
among the three approaches. 
Table 2. Comparison on execution speed 
Algorithm applied to the engine Average consuming time for analyzing color information for 
one T-shirt image 
Sampling Algorithm 10 ms 
Traverse Algorithm (Filtered) 900 ms 
Traverse Algorithm (Un?ltered) 760 ms 
We are also interested in how our proposed t-shirt color search engine works on 
di?erent colors. We further break-down the results for each color. Figure 5 shows the 
results. First of all, we can see that our proposed system performs signi?cantly di?erent 
on di?erent colors. For example, the system can search red, green color T-shirt with 
very high MAP, while it did not show good performance on T-shirt with cyan, pink, 
grey, purple and brown. 
60 L. Chan et al.
0.00 
0.20 
0.40 
0.60 
0.80 
1.00 
MAP for 12 colors 
Fig. 5. Break-down MAP (Mean Average Precision) for each color based on the proposed 
sample-based T-shirt image search. 
4 Conclusions 
This paper focuses on the T-shirt image search task. We considered several distinct 
properties of T-shirt images and proposed a fast and simple sample-based T-shirt image 
search engine, which can e?ectively search T-shirt by main color, and optional secon- 
dary colors. Instead of traversing all pixels, our proposed sample-based t-shirt search 
system search t-shirt is only based on a few samples. How to select representative 
sampling points is very crucial for the search accuracy. In this study, 12 representative 
pixels were extracted from the estimated T-shirt area. Several statistical analyses were 
performed to bound the sampling region. We evaluated our system based on 200 pilot 
T-shirt images. Both the MAP and MRR results indicated that the proposed system 
signi?cantly outperforms the straight-forward, brute force un?ltered traverse search, and 
obtains similar results with a much complex, time-consuming ?ltered traverse algorithm 
which removes the background color for t-shirt image during the search. We further 
break-down the results for each color, and the results indicate that the proposed system 
performs signi?cantly di?erent on di?erent colors. The system can search red, green 
color T-shirt with very high MAP, while it did not show good performance on purple 
and brown T-shirt. We also evaluated the three di?erent systems by testing the execution 
speed in the same testing environment. The proposed system shows clear advantage in 
execution e?ciency. The execution speed is less than 1/50 compared with the other two 
engines. In future, we will validate our proposed sample-based T-shirt search engine on 
large dataset with more T-shirt images. 
A Fast and Simple Sample-Based T-Shirt Image Search Engine 61
References 
1. Veltkamp, R.C., Tanase, M.: Content-Based Image Retrieval Systems: A Survey. Technical 
Report UU-CS-2000-34, Dept. of Computing Science, Utrecht University (2002) 
2. Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries 
in MARS. In: Proceedings of ACM Conference on Multimedia, pp. 403–413 (1997) 
3. Terragalleria. http://www.terragalleria.com 
4. Pajankar, A.: Raspberry Pi Image Processing Programming: Develop Real-Life Examples 
with Python, Pillow, and SciPy. Apress (2017) 
5. Zhang, Q., Song, X., Shao, X., Zhao, H., Shibasaki, R.: From RGB-D images to RGB images: 
single labeling for mining visual models. ACM Trans. Intell. Syst. Technol. 6(2), 16 (2015) 
6. Huang, X.Y., Chen, W.W.: Study on image search engine based on color feature algorithm. 
Adv. Mater. Res. 267, 1010–1013 (2011) 
7. Huang, X., Chen, W: A modular image search engine based on key words and color features. 
In: Transactions on Edutainment VIII. LNCS, vol. 7220, pp. 200–209 (2012) 
8. Tedore, C., Johnsen, S.: Using RGB displays to portray color realistic imagery to animal eyes. 
Curr. Zool. 63, 27–34 (2017) 
9. Lieb, A.: Color indexing for images. US20080044081 (2008) 
10. Claussen, R.: Algorithms: Euclidean algorithm. ACM (1960) 
11. Leon, K., et al.: Color measurement in L*a*b* units from RGB digital images. Food Res. Int. 
39(10), 1084–1091 (2006) 
62 L. Chan et al.
Autonomous Robot KUKA YouBot Navigation 
Based on Path Planning and Tra?c Signals 
Recognition 
Carlos Gordón(?) , Patricio Encalada(?) , Henry Lema(?) , Diego León(?) , 
and Cristian Peñaherrera(?) 
Facultad de Ingeniería en Sistemas, Electrónica e Industrial, Universidad Técnica de Ambato, 
Ambato 180150, Ecuador 
{cd.gordon,pg.encalada}@uta.edu.ec 
Abstract. We present the successful demonstration of autonomous robot KUKA 
YouBot navigation based on path planning and tra?c signals recognition. The 
integration of both capabilities path planning and tra?c signals recognition was 
carried out, thanks to the integration among Robot Operating System, MATrix 
LABoratory software and Open Source Computer Vision Library working envi- 
ronments. The Robot Operating System allows the simulation of the autonomous 
robot navigation by using Gazebo and provides the implementation of the algo- 
rithms in simulated and real platforms. MATrix LABoratory software improves 
the communication tasks taking advantage of data processing tools in the path 
planning process. Finally, Open Source Computer Vision Library allows the 
tra?c signals recognition by using the Scale-Invariant Feature Transform and 
Speeded-Up Robust Features algorithm. The integration of Robot Operating 
System, MATrix LABoratory software and Open Source Computer Vision 
Library is a promising approach to provide autonomous navigation capability in 
any mobile robot and in uncontrolled environments. 
Keywords: Autonomous navigation · KUKA YouBot 
Robot operating system component · Path planning · Tra?c signals recognition 
1 Introduction 
Autonomous robot navigation (ARN) in uncontrolled environments is an extraordinary 
ability for any mobile robot in order to achieve a speci?c goal or perform any task without 
external assistance [1]. ARN requires set of subsystems which are working together, 
such as building a map of the surrounding world, localizing the robot and the goal point 
within the map, making a motion plan according to the map and the localization of the 
beginning and goal points, executing that plan, and be prepared when something changes 
during the motion execution. All the subsystems should be executed at the same time 
which is a challenging task for mobile robots [2]. Several working environments have 
been used for providing autonomous navigation with arti?cial vision techniques in 
robots. Among them we can mention: ROS (Robot Operating System, which is a leading 
development environment in robotics providing tools and libraries for the development 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 63–78, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_6
of robotic systems) [3], Matlab (MATrix LABoratory software, which includes the 
Robotics System Toolbox since the R2015A Matlab’s release) [4], and OpenCV (Open 
Source Computer Vision Library, specially designed for the treatment, capture and 
visualization of images in a wide range of areas such as pattern recognition in robotics, 
biometrics, segmentation, etc.) [5]. 
Di?erent algorithms have been developed in order to integrate the set of subsystems 
for ARN in uncontrolled environments such as path planning [6] and Tra?c Signals 
Recognition [7]. On one hand, path planning is the method of ?nding the best feasible 
path from beginning to goal locations. This topic is of major research and di?erent 
techniques have been reported with the intention to implement the path planning 
approach. Among them, we have the Probabilistic RoadMap (PRM) which is a motion 
planning algorithm used to ?nd a path from start to the goal point in occupancy grid map 
[8]. Other path planning approaches have included Normal Probability [9], e?cient 
interpolation [10], and Heuristics [11] approaches. On the other hand, tra?c Signals 
Recognition has been required to ensure autonomous robot navigation which needs the 
integration of arti?cial vision techniques in order to perform the recognition task [12]. 
Arti?cial vision not only allows the recognition of tra?c signals but also it allows the 
taking decisions when robots perform the autonomous navigation and new sceneries 
appear in the robot’s trajectory [13]. 
The aim of this work is to present the viability and possibility of the integration of 
ROS, Matlab and OpenCV working environments in order to develop the autonomous 
robot KUKA YouBot navigation based on path planning and tra?c signals recognition. 
ROS Hydro medusa is the seventh ROS distribution release, which allows the simulation 
of the autonomous robot navigation by using Gazebo and provides the implementation 
of the algorithm in the real platform of the robot KUKA YouBot. Matlab with the 
Robotics System Toolbox improves the communication tasks with ROS, taking 
advantage of data processing tools in the path planning process. Besides, OpenCV allows 
the tra?c signals recognition by using the SIFT (Scale-Invariant Feature Transform) 
[14] & SURF (Speeded-Up Robust Features) [15] combined algorithm. It is important 
to mention that we mainly take account in reaching the goal location by the robot KUKA 
YouBot and we do not consider the time that the robot requires to achieve the goal 
location due to the fact that the path planning and tra?c signals recognition algorithms 
working together takes a lot of computation time. We are working further in the imple- 
mentation and optimization of other path planning and tra?c signals recognition algo- 
rithms in order to reduce the execution time. Finally, the integration of Robot Operating 
System, MATrix LABoratory software and Open Source Computer Vision Library is a 
promising approach to provide autonomous navigation capability in any mobile robot. 
The following sections describe all the process carried out in the demonstration of 
autonomous robot KUKA YouBot navigation based on path planning and tra?c signals 
recognition. Thus, Sect. 2 describes the Robot Operating System, MATrix LABoratory 
software and Open Source Computer Vision Library working environments integration. 
Then, Sect. 3 introduces the path planning and tra?c signals recognition implemented 
algorithms. Next, Sect. 4 presents the features of the robot KUKA YouBot in which all 
the algorithms were tested. Then, Sect. 5 explains in detail the results reached in the 
64 C. Gordón et al.
Simulation and Experimental testing. And ?nally, Sect. 6 summarizes the conclusions 
of the present work. 
2 Working Environments Integration 
As aforementioned, in order to achieve the autonomous robot navigation approach, it was 
necessary the integration of ROS, Matlab and OpenCV working environments as shown 
in Fig. 1. ROS Hydro medusa is the seventh ROS distribution release, which offers tools 
and libraries for the development robotic systems. In recent years, ROS has gained wide 
currency for the creation of working robotic systems, not only in the laboratory but also in 
industry. The autonomous navigation of KUKA youbot was simulated by using gazebo 
simulator which is integrated with ROS. With the intention of achieving ROS integration 
with stand-alone Gazebo, a set of ROS packages named gazebo_ros_pkgs provides wrap- 
pers around the stand-alone Gazebo. They provide the necessary interfaces to simulate a 
robot in Gazebo using ROS messages, services and dynamic features [16]. It is important 
to mention that the youBot Gazebo packages incorporates geometry, kinematics, dynamics 
and visual models of the KUKA youBot in Universal Robotic Description Format (URDF) 
as well as launch files and tools needed to operate the robot in Gazebo. The Robotics 
System Toolbox included in Matlab provides a complete integration between Matlab, 
Simulink and ROS. The toolbox enables to write, compile and execute code on ROS-enable 
robot’s and on robots simulators like aforementioned Gazebo, allowing to generate 
ROS node from Simulinks model and implement it into the ROS network [17]. The artifi- 
cial vision algorithm for traffic signals recognition was implemented by using Open CV, 
which is the Open Source Computer Vision Library, specially designed for the treatment, 
capture and visualization of images in a wide range of areas such as robotics, biometrics, 
segmentation, human–computer interaction, monitoring and object recognition. 
Fig. 1. Integration of working environments. 
A detailed architecture of ROS, Matlab and OpenCV integration is depicted in 
Fig. 2. ROS is fundamentally a client/server system. It consists of a series of nodes 
(programs) that communicate with each other through topics (dissemination) or services 
(interactive communication). It is a process that provides a hard-realtime-compatible 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 65
Fig. 2. Architecture of ROS, Matlab and OpenCV integration. 
66 C. Gordón et al.
loop to control a robot mechanism, which is usually designed in a modular way, so that 
a system is formed by di?erent controllers as di?_drive_controller, position_controllers, 
force_torque_sensor_controller and others. ROS working environment mainly includes 
three nodes: image processing, user application and controller node. 
ROS Node: Image processing converts images from ROS to OpenCV format or 
vice versa through CvBridge, a library which enables to send or receive images with the 
OpenCV image processing. Also, this node obtains images with the subscribers from 
the publishers established in the ROS Nod: User application and sends di?erent 
commands with its publisher to the subscriber in the ROS Node: controller_node. 
ROS Node: User_application executes the communication between Client and 
Server via ROS Action Protocol, which is built on top of ROS messages. The client and 
server then provide a simple API (application program interface, which is a set of 
routines, protocols, and tools for building software applications) for users to request 
goals (on the client side) or to execute goals (on the server side) via function calls and 
callbacks. The User_application and controller nodes communication provides to the 
controller node the logical commands for being interpreted to physical actions. The ROS 
Action Clients send the position and trajectory information processed with the API and 
other tools and protocols to the Action Server of controller node. While, the ROS 
Publisher of the User_application node sends the commands like velocity, to the ROS 
Subscriber of controller node for the next stage of the process in the communication. 
ROS Node: Controller_node transforms commands into measures or signals that 
can be understood by the actuators of the robot. 
ROS Node: Matlab_global_node corresponds to the script or program created in 
Matlab, which receive the data from the controller_node process the information, and 
sends a new command through publisher to the controller_node in order to perform an 
action in the di?erent actuators in the robot KUKA youbot. 
OpenCV image processing handles images, which uses di?erent scripts, libraries 
and techniques like SIFT & SURF. The images are processed thanks to the communi- 
cation between the cv:Mat (OpenCV-Class to store images) and CvBridge (ROS-library 
to transform images formats). 
Finally, YouBot Hardware is the space where the robot system is represented as a 
combination of decoupled functional subsystems. The manipulator arm and the base 
platform are arranged as the combination of several joints. At the same time, each joint 
is de?ned as a combination of a motor and a gearbox. The communication with the 
hardware and the driver is done using the Serial Ethercat connection. 
3 Implemented Algorithms 
The ARN was performed by the application of two algorithms. The ?rst one is the path 
planning algorithm and the second one is the tra?c signal recognition algorithm. 
Considering the path planning requirement, di?erent algorithms were studied. We are 
able to mention probabilistic roadmap (PRM), which is a probabilistic method, one of 
its main virtues is its e?ciency in the calculation of trajectories with robots of many 
degrees of freedom. It can be either a network (single query) or multiple (multiple query) 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 67
[18]. Also we have the Lazy PRM algorithm, which is a single query variant. Therefore, 
the pre-processing phase will be quite simple, since it is not necessary to generate a 
complete network, but simply one that will help us to solve the particular problem [19]. 
Finally, another algorithm is the rapidly exploring random tree (RRT), which is a sub-optimal, 
static model-based, probabilistic planning algorithm that builds a single, unidir- 
ectional, tree-like graph, this part from the starting point and expands throughout the 
working environment through a sampling process that looks for random points until it 
reaches the end point, at which point it stops [20]. The features of the cited path planning 
algorithms are summarized in Table 1 in terms of processing time, space-constrained 
solutions, robustness, and computational cost. 
Table 1. Algorithms for path planning 
Algorithms Processing time 
(seconds, s) 
Space-constrained 
solutions (%) 
Robustness (%) Computational 
Cost (IPS) 
PRM Average Low Low Average 
Lazy PRM Low Average Average Low 
RRT Average Average High High 
Taking into account the features of the reviewed path planning algorithms we 
consider the PRM algorithm which provides average processing time and average 
computational cost. In fact, PRM algorithm avoids increasing the processing time of the 
integrated architecture of ROS, Matlab and OpenCV. The path planning algorithm was 
implemented in Matlab, which was carried out thanks to the implementation of pure 
pursuit algorithm using probabilistic roadmaps (PRM) in robot navigation. The ?ow 
chart of the implemented PRM algorithm is depicted in Fig. 3. First, we mainly consider 
the Robot and algorithm parameters like: Robot dimensions, Start and objective point, 
PRM node and PRM minimum distance. Then, we get the image and process the scenery 
from Gazebo. Next, we generate the occupancy grid of the image processed in grayscale 
considering 0 free and 1 occupied). The following step is to in?ate map in rate of robot 
dimensions. Then, it is necessary to ?nd random paths and perform the decision process 
taking into account the question. Is Path empty? When the answer is true the map is 
updated and nodes incremented. In contrary, the path is free and continues the navigation 
until reaching the goal location. 
Then, di?erent algorithms were reviewed in order to implement the tra?c signals 
recognition requirement. Among them we are able to mention the Binary Robust Inde- 
pendent Elementary Features (BRIEF) algorithm which works with strings of bits in 
order to describe characteristic points. For this reason, BRIEF algorithm is much faster 
than SIFT and SURF algorithm. BRIEF algorithm also reduces the complexity in the 
matching and detection process between images, which lets low-powered devices run 
this algorithm [21]. It is important to mention that BRIEF algorithm is not invariant to 
rotation because it can only handle a maximum di?erence of 10 to 15 degrees. Another 
interesting algorithm is the Oriented Fast and Rotated BRIEF (ORB) algorithm. ORB 
algorithm was created from BRIEF and was modi?ed in order to be invariant to rotation 
and strong against noise [22]. This method uses FAST (Feature from Accelerated 
68 C. Gordón et al.
Segment Test) detector to obtain points and BRIEF descriptor. As a result, ORB can be 
run in reduced processing capacity devices. Finally an advanced algorithm is a combi- 
nation of Scale-Invariant Feature Transform & Speeded-Up Robust Features algorithms. 
SIFT & SURF algorithm allows the automatic tra?c signals detection in real time [23]. 
The main advantage of this algorithm is that the extraction of interest points is acceptable 
and provides the best features in scale, illumination and rotation. As an added value, 
SIFT & SURF algorithm provides higher robustness indicated by the lower BER values 
[24, 25]. The features of the studied algorithms for tra?c signals recognition in terms 
of processing time, accuracy, robustness, computational cost and rotation are summar- 
ized in Table 2. 
Fig. 3. Flow chart of the PRM algorithm. 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 69
Table 2. Algorithms for tra?c signals recognition 
Algorithms Processing time 
(seconds, s) 
Accuracy 
(dispersion, s) 
Robustness (%) Computation 
al Cost (IPS) 
Rotation 
(Degrees, °) 
BRIEF High Medium Low Low 10°–15° 
ORB High Medium Medium Low Invariant 
SIFT & 
SURF 
Low High High High Invariant 
Finally, the tra?c signals detection system was implemented in OpenCV, which was 
carried out with the SIFT & SURF algorithm by considering the features like high accu- 
racy, high robustness and invariant to rotation. It is important to mention that in the 
present work, we do not consider processing time and computational cost features. So, 
we are working farther in order to reduce processing time, and computational cost with 
other algorithms in feature studies. 
4 Robot KUKA YouBot 
The integration of ROS, matlab and openCV was implemented experimentally in the 
KUKA youBot which is an open, expandable and modular robotic system. This robot 
is specially developed for research purposes with emphasis on robotics. KUKA you-bot 
mainly consist of an omnidirectional platform, a robotic arm with ?ve degrees of 
freedom, and a gripper grip with two ?ngers which is depicted in Fig. 4. All the data 
Fig. 4. KUKA YouBot, available in Technical University of Ambato in Ecuador. 
70 C. Gordón et al.
acquisition and the experimental demonstration were developed in the robotics labora- 
tory of the Technical University of Ambato in Ecuador. 
5 Simulation and Experimental Results 
The simulation of the system using gazebo consists in having the robot with its actuators 
and sensors in a three-dimensional environment, where the transit signals are placed so 
that they have line of sight with the camera. The procedure begins with the modeling of 
the robot that is obtained from the repository of YouBot Store and its surroundings with 
3D models made in SketchUp and Blender, whose models must be managed by Gazebo 
for which, in the .con?g and .sdf ?les are con?gured the physical properties of the object 
in 3D, such as mass, inertia, texture, shape and color to be imported into the gazebo 
work space, where we can already use them to assemble the navigation environment of 
the mobile robot. Finally, we can execute the movement control scripts of both the 
omnidirectional platform and the robotic arm. The pictures of the simulation of robot 
KUKA Youbot in gazebo environment are depicted as follows. Figure 5(a) sketches the 
robot KUKA Youbot in 3D environment. Figure 5(b) depicts a zoom in of the robot 
KUKA Youbot in 3D environment. Figure 5(c) shows the robot KUKA Youbot closed 
Fig. 5. Gazebo Simulation. (a) Robot KUKA Youbot in 3D environment. (b) Zoom in of robot 
KUKA Youbot in 3D environment. (c) Robot KUKA Youbot and stop tra?c signal. (d) Robot 
KUKA Youbot and one way tra?c signal. 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 71
to the stop tra?c signal. And Fig. 5(d) depicts the robot KUKA Youbot closed to the 
one way tra?c signal. 
The path planning algorithm was implemented in the created road map (25 m * 20 m) 
which is depicted in Fig. 6. Where we observe depicted with asterisks, the start location, 
goal location, one way tra?c signal and stop tra?c signal within the map. The purpose 
of one way tra?c signal is the path changing and the stop tra?c signal is to wait for 60 s 
before continuing the path. Moreover, the result of PRM algorithm applied in the prob- 
abilistic road map is depicted in Fig. 7, in which we are able to identify 60 nodes. We 
do not use greater number of nodes with the intention of reducing the computational 
e?ort. We are able to detect the path in orange line. It is important to mention that the 
solutions provided by PRM are not the optimal path. Also, the optimized path is depicted 
in green dashed line obtained via mean square optimization. Besides, we have the real 
trajectory in red continuous line performed by the robot KUKA YouBot, we mainly 
appreciate the changes in the trajectory due to the tra?c signal detection and taking 
decisions. It is necessary to mention that we avoid some features like the proximately 
to walls and other objects in order to reduce complexity. 
Fig. 6. Road Map with start, goal and tra?c signals locations. 
72 C. Gordón et al.
Fig. 7. Road Map with PRM execution. Probabilistic Path in orange line, Optimized path in green 
dashed line, Real trajectory in red continuous line. 
The arti?cial vision techniques based on SIFT & SURF algorithm allowed 
performing the Tra?c Signals Recognition in the real platform execution. The summar- 
ized process was carried out in the following way. First, it is necessary to have the pattern 
library of the tra?c signals. The pattern of the Stop tra?c signal is sketched in Fig. 8(a). 
Second, it is the acquisition of the image with a Microsoft HD camera located in the 
?ngers of the gripper, when the KUKA YouBot is executing the path. The obtained 
image from the camera is sketched in Fig. 8(b). Third, it is the extraction of the features 
of the pattern image. Figure 8(c) shows the features extraction from the pattern. Fourth, 
it is the extraction of the features of the obtained image from the camera which is depicted 
in Fig. 8(d). Fourth, it is the comparison of the features between the two previous 
extractions, Fig. 8(e) depicts the feature comparison. Finally, we have the detection 
result of the tra?c signal, which is shown in Fig. 8(f). 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 73
a) 
b) 
c) d) 
e) 
f) 
Fig. 8. SIFT & SURF algorithm execution, (a) Pattern of the Stop tra?c signal, (b) Obtained 
image from the camera, (c) Features extraction from the pattern, (d) Features extraction from the 
obtained image, (e) Feature comparison Pattern, and (f) Detection result. 
Also, the pictures when the robot KUKA YouBot meets the tra?c signal are depicted 
in Figs. 9 and 10, during the real test. The KUKA Youbot with the one way tra?c signal 
is sketched in Fig. 9. While Fig. 10 shows the moment when the robot reaches the place 
where the Stop tra?c signal is located. In simulation and real platforms the tasks were 
performed with an average linear velocity around 0.20 m/s and the average angular 
velocity around 0.45 m/s. The average time of reaching the goal by the robot KUKA 
YouBot was around 2 min. We mainly take into account in reaching the goal location 
by the robot KUKA YouBot and we do not consider the time that the robot requires to 
achieve the goal location due to the fact that the path planning and tra?c signals recog- 
nition algorithms working together takes a lot computation time and e?ort. We are 
working further in the implementation and optimization of other path planning and tra?c 
74 C. Gordón et al.
signals recognition algorithms in order to reduce the execution time. Besides, we are 
looking for the implementation of machine learning algorithms in order to improve the 
ability of recognition of all available tra?c signals. 
Fig. 9. Robot KUKA YouBot and Stop tra?c signal. 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 75
Fig. 10. Robot KUKA YouBot and One way tra?c signal. 
6 Conclusions 
In conclusion, the autonomous robot KUKA YouBot navigation based on path planning 
and tra?c signals recognition has been presented. The integration of both capabilities 
path planning and tra?c signals recognition was achieved by the integration of ROS, 
Matlab and OpenCV working environments. ROS allowed the simulation of the auton- 
omous robot navigation by using Gazebo and provided the implementation of the algo- 
rithm in the real platform of robot KUKA YouBot. Matlab improved the communication 
tasks by taking advantage of data processing tools in the path planning process. Finally, 
OpenCV allows the tra?c signals recognition by using the SIFT & SURF algorithm. 
We have successfully demonstrated that the integration of ROS, Matlab and OpenCV 
is a promising approach to provide autonomous navigation capability in any mobil robot. 
Finally, it is important to mention that the capability of tra?c signal recognition opens 
new areas of research in the ?eld of arti?cial intelligence and object recognition due to 
the fact that the fundamentals of the tra?c signals recognition can be applied in other 
kind of objects recognition. 
76 C. Gordón et al.
Acknowledgement. The authors acknowledge the Technical University of Ambato in Ecuador 
for providing all support and facilities including the robot KUKA YouBot. 
References 
1. Perez, A., Karaman, S., Shkolnik, A., Frazzoli, E., Teller, S., Walter, M.R.: Asymptotically-optimal 
path planning for manipulation using incremental sampling based algorithms. In: 
IEEE/RSJ International Conference Intelligent Robots and Systems, pp. 4307–4313 (2011) 
2. Corke, P.: Integrating ROS and MATLAB. IEEE Robot. Autom. Mag. 22(2), 18–20 (2015) 
3. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T.: ROS: an opensource robot operating 
system. In: ICRA Workshop on Open Source Software, vol. 3, no. 2, p. 5 (2009) 
4. Matlab: Robotics System Toolbox. http://mathworks.com/help/robotics/index.html. 
Accessed 21 Mar 2018 
5. Bradski, G., Kaehler, A.: OpenCV. Dr. Dobb’s journal of software tools, 3ed (2000) 
6. Kumar, N., Zoltán, V., Szabó-Resch, Z.: Robot path pursuit using probabilistic roadmap. In: 
IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI), 
pp. 000139–000144 (2016) 
7. Adorni, G., Monica, M., Agostino, P.: Autonomous agents coordination through tra?c signals 
and rules. In: IEEE Conference on Intelligent Transportation System (ITSC 1997), pp. 290– 
295 (1997) 
8. Kavraki, L.E., Švestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path 
planning in high-dimensional con?guration spaces. IEEE Trans. Robot. Autom. 12(4), 566– 
580 (1996) 
9. Amith, A.L., Singh, A., Harsha, H.N., Prasad, N.R., Shrinivasan, L.: Normal probability and 
heuristics based path planning and navigation system for mapped roads. Procedia Comput. 
Sci. 89, 369–377 (2016) 
10. Akulovi, M., Ikeš, M., Petrovi, I.: E?cient interpolated path planning of mobile robots based 
on occupancy grid maps. IFAC Proc. 45(22), 349–354 (2012) 
11. Jun, J.Y., Saut, J.P., Benamar, F.: Pose estimation-based path planning for a tracked mobile 
robot traversing uneven terrains. Rob. Auton. Syst. 75, 325–339 (2016) 
12. Mahadevan, S.: Machine learning for robots: a comparison of di?erent paradigms. In: 
Workshop on Towards Real Autonomy, IEEE/RSJ International Conference on Intelligent 
Robots and Systems (IROS 1996) (1996) 
13. Lidoris, G., Rohrmuller, F., Wollherr, D., Buss, M.: The Autonomous City Explorer (ACE) 
project—mobile robot navigation in highly populated urban environments. In: IEEE 
International Conference on Robotics and Automation (ICRA 2009), pp. 1416–1422 (2009) 
14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 
60(2), 91–110 (2004) 
15. Bay, H., Tinne, T. Luc Van, G.: Surf: speeded up robust features. In: European Conference 
on Computer Vision, pp. 404–417. Springer, Berlin (2006) 
16. Craig, C.: A Robotics Framework for Simulation and Control of a Robotic Arm for Use in 
Higher Education. MS in Computer Science Project Reports (2017) 
17. Galli, M., Barber, R., Garrido, S., Moreno, L.: Path planning using Matlab-ROS integration 
applied to mobile robots. In: IEEE International Conference on Autonomous Robot Systems 
and Competitions (ICARSC), pp. 98–103 (2017) 
18. Kavraki, L.E., Latombe, J.C., Latombe, E.: Probabilistic roadmaps for robot path planning. 
In: Practical Motion Planning Robotics Current Approaches and Future Directions, pp. 1–21 
(1998) 
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 77
19. Bohlin, R., Kavraki, L.E.: Path planning using lazy PRM. In: Proceedings of the IEEE 
International Conference on Robotics and Automation, vol. 1, pp. 521–528 (2000) 
20. LaValle, S.M.: Rapidly-exploring random trees: a new tool for path planning. Citeseerx, vol. 
129, pp. 98–11 (1998) 
21. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary 
features. In: Proceedings of the 11th European Conference on Computer Vision, ser. ECCV 
2010, pp. 778–792. Springer, Berlin (2010) 
22. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an e?cient alternative to SIFT or 
SURF. In: IEEE International Conference on Computer Vision (ICCV) (2011) 
23. Dreuw, P., Steingrube, P., Hanselmann, H. Ney, H.: SURF-face: face recognition under 
viewpoint consistency constraints. In: BMVC, pp. 1–11 (2009) 
24. Shaharyar, T., Khan, A., Saleem, Z.: A comparative analysis of SIFT, SURF, KAZE, AKAZE, 
ORB, and BRISK. In: International Conference on Computing, Mathematics and Engineering 
Technologies (iCoMET) (2018) 
25. Zrira, N., Hannat, M., Bouyakhf, E. H., Ahmad, H.: 2D/3D object recognition and 
categorization approaches for robotic grasping. In: Advances in Soft Computing and Machine 
Learning in Image Processing, pp. 567–593. Springer, Cham (2018) 
78 C. Gordón et al.
Towards Reduced Latency in Saccade Landing 
Position Prediction Using Velocity Pro?le Methods 
Henry Gri?th1(?) , Subir Biswas1 , and Oleg Komogortsev2 
1 
Department of Electrical and Computer Engineering, Michigan State University, 
East Lansing, MI 48824, USA 
{griff561,sbiswas}@msu.edu 
2 
Department of Computer Science and Engineering, Michigan State University, 
East Lansing, MI 48824, USA 
ok@msu.edu 
Abstract. Saccade landing position prediction algorithms are a promising 
approach for improving the performance of gaze-contingent rendering systems. 
Amongst the various techniques considered in the literature, velocity pro?le 
methods operate by ?rst ?tting a window of velocity data obtained at the initiation 
of the saccadic event to a model pro?le known to resemble the empirical dynamics 
of the gaze trajectory. The research described herein proposes an alternative 
approach to velocity pro?le-based prediction aimed at reducing latency. Namely, 
third-order statistical features computed during a ?nite window at the saccade 
onset are mapped to the duration and characteristic parameters of the previously 
proposed scaled Gaussian pro?le function using a linear support vector machine 
regression model using an o?ine ?tting process over the entire saccade duration. 
Prediction performance is investigated for a variety of window sizes for a data 
set consisting of 9,109 horizontal saccades of a minimum mandated data quality 
induced by a 30-degree step stimulus. An RMS saccade amplitude prediction error 
of 1.5169° is observed for window durations of one-quarter of the saccade dura- 
tion using the newly proposed method. Moreover, the method is demonstrated to 
reduce prediction execution time by three orders of magnitude versus techniques 
mandating online ?tting. 
Keywords: Eye movement prediction · Gaze-contingent rendering 
Foveated rendering 
1 Purpose 
While gaze-contingent rendering systems (GCRS) o?er tremendous potential for 
enhancing the user experience in virtual reality (VR) environments, latency concerns 
during saccadic eye movements remain an area of open interest in the academic literature 
[1, 2]. To address these limitations, a variety of techniques for predicting the landing 
position at the onset of saccadic events continue to be proposed [3, 4]. A subclass of 
these techniques develops predictions based upon ?tting kinematic gaze data to a char- 
acteristic function known to resemble the empirical dynamics of saccadic trajectories 
[5]. Approaches ?tting eye velocity data to a model pro?le consistent with the main 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 79–91, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_7
sequence relationship between saccade velocity, amplitude, and duration [6], hereby 
referred to as velocity pro?le methods, have been previously considered. While prom- 
ising in principle with respect to their capacity to produce physiologically-meaningful 
gaze location estimates across the entire saccadic duration through direct integration, 
current approaches instead use pro?le parameters obtained from the ?tting process as 
predictor values in a linear regression model for amplitude determination. Furthermore, 
application of this technique assumes the feasibility of performing the requisite optimi- 
zation for ?tting in an online capacity, which may prove challenging depending upon 
the computational capacity of the deployment hardware, along with the speci?c pro?le 
model and optimization algorithm utilized [7]. 
The research described herein seeks to address these concerns by introducing an 
alternative technique for velocity pro?le-based prediction of saccade landing position. 
The proposed approach performs the requisite pro?le ?tting process in an o?ine training 
process. This modi?cation allows for ?tting over the entire saccade duration, thereby 
improving adherence to the model pro?le versus online methods ?tting only over the 
initial portion of the saccade. Using these results, linear support vector machine regres- 
sion models are developed which map simplistic features computed over a ?nite duration 
window occurring near the saccade onset to both the parameter sets de?ning the pro?le 
function, along with the saccade duration. These models are subsequently utilized in 
online operation, thus providing physiologically-meaningful estimates of the saccadic 
trajectory throughout its duration without requiring the previously mandated online 
?tting process. Results are presented for a data set consisting of 9,109 horizontal 
saccades induced by a 30-degree step stimulus, each subjected to speci?ed quality 
inclusion criteria. Details regarding the experimental procedure, data quality ?ltering, 
algorithm development and analysis, and plans for further research are provided in the 
remainder of this manuscript. 
2 Background/Signi?cance 
Eye tracking technology has long been employed across a variety of research domains. 
Speci?c applications range from fundamental endeavors, such as exploring the nature 
of information processing through the human visual system (HVS) [8], to more applied 
e?orts, including applications in visual marketing [9] and biometrics [10]. Commercial 
interest in the technology has recently accelerated, as indicated by considerable acquis- 
ition activity in the space (i.e. Google’s acquisition of Eye?uence, Facebook’s acquis- 
ition of Eye Tribe, Apple’s acquisition of Sensormotoric Instruments (SMI), etc.). 
Amongst emerging applications, eye tracking is especially promising for integration 
within VR environments, due to its potential to improve display performance through 
application of gaze-contingent rendering paradigms [11]. 
GCRS operate by varying display content as a function of the user’s assumed point 
of gaze, which is obtained through use of an eye tracker. Such foveated rendering strat- 
egies exploit the inherent asymmetry in visual acuity across the HVS, where high quality 
vision is isolated in the center of the ?eld. This asymmetry is associated with the dense 
concentration of photoreceptors in the fovea, along with the supporting processing 
80 H. Gri?th et al.
capacity throughout the remainder of the visual pathway [12] while GCRS have received 
attention in the literature for research investigating the unique contributions of central 
and peripheral vision during various tasks (i.e. reading [13], visual search [14], etc.), 
commercial applications seeking to enhance display performance through improved 
e?ciency and reduced latency have also been considered. Namely, studies modulating 
various determinants of display quality, such as spatial resolution [15] and color [16] 
have been investigated. 
While the speci?cations of display and eye tracking hardware are continuously 
improving, system latency remains a fundamental limitation for implementing GCRS 
[17]. Latency concerns are especially pronounced during the rapid eye movements 
between points of ?xation known as saccades, where substantial misalignment between 
the optimized display region and true gaze location may occur. While saccadic suppres- 
sion is generally believed to mitigate the e?ect of misalignment by reducing the sensi- 
tivity of the HVS during the saccadic event, examples of intersaccadic perception have 
been noted in the literature [18, 19]. Moreover, such misalignments are problematic after 
the saccade has ended, as evidence suggests that perception is restored rapidly (between 
10 and 50 ms) after completion [20]. To help avoid misalignments in the presence of 
saccades, GCRS may utilize saccade landing position prediction (SLPP) techniques, in 
which the subsequent display update is adjusted based upon the anticipated gaze landing 
point. Predictions are performed at the initiation of the saccadic event as identi?ed using 
various online eye movement classi?cation algorithms (i.e. I-VT, etc.) [21]. 
A variety of techniques for SLPP have been proposed in the literature over the prior 
two decades. While diverse in their approach, recent research [4] has proposed a partition 
of current methods into those regressing data onto a speci?c model motivated by the 
anatomy and physiology of the underlying oculomotor system, and those which operate 
independently of such models. With respect to model-based algorithms, techniques 
leveraging functions derived from an underlying oculomotor plant model [22], along 
with those approaches which assume a model pro?le function based upon empirical 
observations of eye movement trajectories have been proposed [5, 7]. Amongst the latter 
class of solutions, algorithms performing standard linear regression [3], along with an 
approach based upon a Taylor series expansion [4], have been demonstrated. 
3 Methods 
3.1 Experimental Procedure 
Data was obtained from an eye-tracking study conducted at Texas State University in 
2014 under protocol approved by the Institutional Research Board. A total of 335 partic- 
ipants (178 male, 157 female), ranging in age from 18 to 46, were initially enrolled in 
the study, which required completion of a variety of tasks aimed at investigating multiple 
oculomotor behaviors of interest (i.e. performing horizontal and oblique saccades under 
the induction of a stimulus, reading, etc.). Of those initial enrollees, 322 participants 
completed two consecutive sessions of the horizontal stimulus (HS) task under consid- 
eration within this research. 
Towards Reduced Latency in Saccade Landing Position Prediction 81
Within the HS task, saccades were induced by varying a stimulus along the horizontal 
axis of a 474 × 297 mm (1680 × 1050 pixel) Viewsonic 22? display in a 30-degree step-wise 
fashion. Participants were positioned 550 mm from the black background display. 
The utilized stimulus was a white circle of diameter corresponding to approximately 1° 
of the visual angle, which enclosed a smaller black circle to promote focus at the center. 
Beginning at the origin, the stimulus displaced horizontally, oscillating between -15° 
and 15° for 100 iterations, remaining stationary for 1 s between each step. 
Oculomotor behaviors were recorded using a SR EyeLink 1000 eye tracking sensor. 
The sensor performs monocular eye tracking at a sampling rate of 1000 Hz with a 
speci?ed typical accuracy of 0.25–0.50°, and a spatial resolution of 0.25° during 
saccadic events. An example of the raw data output of the eye tracker over a HS task 
session is depicted in Fig. 1. 
Fig. 1. Sample eye tracker output (Subject 1, Trial 1). 
3.2 Data Inclusion Criteria 
To ensure adequate data quality, inclusion criteria were established at both the session 
and event level. Namely, session-level data was screened according to the mean accuracy 
computed during post-calibration veri?cation, along with the portion of lost data and 
spatial precision computed during each session. Intra-recording precision was computed 
as the root-mean square (RMS) value of the inter-sample angular distances [23] occur- 
ring during classi?ed inter-stimuli ?xation events of at least 500 ms duration, with 
?xation events identi?ed using an o?ine eye movement classi?er described in [24]. A 
visualization of two classi?ed ?xation events of varying duration occurring during the 
stimulus stationary period is depicted in Fig. 2. 
82 H. Gri?th et al.
Fig. 2. Visualization of varying duration ?xation events occurring during stationary stimulus 
interval for precision computation 
The distribution of all three session-level inclusion metrics across the 644 sessions 
is depicted in Fig. 3, with the associated inclusion thresholds summarized in Table 1. 
Fig. 3. Distribution of session-level data quality inclusion metrics across candidate data set. 
Towards Reduced Latency in Saccade Landing Position Prediction 83
To produce a symmetrical data set (i.e.: two sessions per participant), the matching 
session for each participant was also removed for records violating session-level inclu- 
sion criteria. The resulting data set after preliminary quality ?ltering consisted of 91 
subjects, having a mean accuracy of 0.3908° ± 0.1044°, portion of lost data during 
recording of 0.8724% ± 0.7570%, and a precision of 0.0149° ± 0.0058° (mean ± std). 
Table 1. Session-level data quality thresholds 
Data quality inclusion metric Threshold value 
Maximum mean accuracy 0.6° of the visual angle 
Maximum proportion of lost data samples 3% 
Minimum intra-recording mean precision 0.05° 
Additional inclusion criteria were applied on the saccadic event level, with events 
identi?ed using the aforementioned o?ine eye movement classi?cation algorithm. 
Namely, all classi?ed saccades whose amplitudes were not consistent with the induced 
stimulus (i.e. corrective saccades, partitions of the stimulus interval into two saccadic 
events, etc.) were discarded. Moreover, events exhibiting any lost data samples, or 
physiologically infeasible eye velocities were also removed from the analysis set. 
Finally, to remove scenarios in which classi?er timing errors may corrupt results due to 
either delayed detection or premature termination, a maximum initial and ?nal velocity 
value was also mandated. Saccadic event-level exclusion criteria are summarized in 
Table 2. 
Table 2. Event-level data quality thresholds 
Data quality inclusion metric Threshold value 
Allowable amplitude range 28°–32° 
Maximum number of lost data samples 0% 
Maximum velocity 800°/s 
Maximum initial and ?nal velocity 100°/s 
The aggregate application of session and event level data inclusion criteria produced 
an analysis data set of 9109 saccades. The distribution of amplitudes of classi?ed 
saccadic events in both the original and analysis data set is depicted in Fig. 4. 
84 H. Gri?th et al.
Fig. 4. Distribution of saccade amplitudes for entire classi?er output and analysis subset. 
3.3 Analysis Methods 
The scaled Gaussian velocity pro?le speci?ed in (1), originally introduced in [7] for 
SLPP applications, was employed as a model velocity function within this work. 
?(p)) ˜) a *) e 
-) 
( 
t -) b 
c 
)2 
(1) 
where 
p = [A, b, c]
'] 
denotes the characteristic parameter vector of the pro?le function, 
a is a scaling parameter representing the maximum saccade velocity, b is a location 
parameter representing the time of occurrence of the maximum velocity, and c is a shape 
parameter related to the width of the pro?le. 
To begin, an o?ine procedure was performed for each element of the analysis set, 
where optimal parameter values were computed by ?tting the velocity data of each 
sample over the entire saccadic event to the pro?le function in (1) via non-linear least 
squares optimization as speci?ed in (2). 
min 
( ?i 
i 
r2 i 
= f (v, ?(p)) 
) 
S.T.:pi 
?i 
Ii 
(2) 
Towards Reduced Latency in Saccade Landing Position Prediction 85
where r2 i 
is the residual sum of squares loss function, v is the velocity data computed 
from the eye tracker output using a second order Savitzky–Golay ?lter, and Ii is the 
interval bound on the ith component of the parameter vector. To control for variability 
associated with classi?er performance with respect to detection of the saccade onset, all 
records were adjusted such that any preliminary data for which the radial velocity was 
below 20° per second was truncated (i.e. reducing excessive data for the case of prema- 
ture detection. No such adjustments were performed for late detection cases as they were 
addressed in the data pre-?ltering process). Interval bounds were established using 
physiological information and empirical analysis as a function of the local data pro?le 
as follows: 
A ?s 
[ 
0.9 
*. 
vmax, 1.1 
*. 
vmax 
] 
, 
b ?. 
[ 
0.7 
*. 
D 
2 
, 1.3 
*. 
D 
2 
] 
, 
c ?. 
[ 
0, 1.3 
*. 
D 
2 
] 
, 
where vmax is the maximum value of the velocity sample, and D is the duration of the 
velocity sample. All ?tting operations were performed using the MATLAB ?t function, 
which performs non-linear least squares optimization using the Levenberg–Marquardt 
algorithm. 
Next, a feature set based upon the 3rd order statistics of the windowed time series 
was computed for the three durations of interest (3), 
Xw 
= [v* W 
, nv|vw 
= v* W 
, s 
( 
vW 
) 
, k 
( 
vw 
) 
, 
a* W 
, na|aw 
= a* W 
, s 
( 
aW 
) 
, k 
( 
aw 
) 
]
'w 
(3) 
where vw and aw denote the ?xed windowed velocity and acceleration data (determined 
as the traditional derivative of the velocity signal) of duration W, (·)* denotes the 
maximum value of the windowed time series, and s(·) and k(·) denote the standard devi- 
ation skewness operators, respectively. For the current experiment, the considered 
window durations were 
W ?h 
{ 
D 
2 
, 
D 
4 
, 
D 
8 
} 
. The feature set was chosen in an ad-hoc 
fashion on the basis of preliminary simplicity, along with initial analysis and supporting 
domain intuition. 
Once the feature set had been computed for the various window durations, predictive 
linear support vector machine regression models, hereby denoted as ??j, j ?s {12,3, 4}, 
for both the characteristic parameter set elements and saccade duration were developed. 
All models were obtained using the ?trsvm function in MATLAB under default algo- 
rithm hyperparameters, with 5-fold cross validation performed. A summary of the 
proposed modi?ed prediction work?ow versus the previously proposed online method 
is depicted in Fig. 5. 
A visualization of pro?le estimates for the various ?xed window durations consid- 
ered herein is depicted in Fig. 6. As noted, all three symmetrical estimates are unable to 
model the demonstrated skewness of the velocity data associated with large amplitude 
saccades. 
86 H. Gri?th et al.
Fig. 6. Predicted velocity pro?les for varying window durations. 
4 Results 
The online amplitude estimation procedure depicted in Fig. 5 was employed across the 
entire analysis data set. The RMS error of the saccade amplitude prediction was used as 
a metric to evaluate prediction accuracy. Requisite computational time, as quanti?ed 
using the internal timer available in MATLAB through the native tic and toc functions 
(Intel i7-7500U processor, 16 GB RAM), was also recorded. Amplitude estimates were 
formulated on kinematic principles as denoted in (4). Integrations were estimated 
numerically in MATLAB using the trapz function. 
Ei 
= A ^i 
-i 
Ai 
= 
?i 
D 
0 
v(t)dt -t 
Ai (4) 
Fig. 5. Proposed work?ow for modi?ed velocity pro?le-based SLPP. 
Towards Reduced Latency in Saccade Landing Position Prediction 87
To perform preliminary benchmarking of the e?cacy of the proposed approach, 
amplitude estimates were also developed using a variation of the technique described 
in [7]. Namely, ?tting of the velocity data for ?xed window durations was performed 
online in a manner identical to that presented above for the o?ine training procedure 
introduced herein. A linear regression (performed using MATLAB’s ?tlm function) 
model was then developed using 5-fold cross validation to estimate the saccade ampli- 
tude as a function of the 4 parameters proposed in the original work as estimated from 
the online ?tting procedure (i.e. 
( 
a, b, c, 
c 
a 
) 
). It should be noted that this method does 
not provide an estimate of the velocity trajectory over the remainder of the saccade 
duration due its inability to directly estimate the saccade duration. This benchmarking 
approach di?ers slightly from that originally proposed in [7], in that a rolling window 
with convergence criteria is replaced by the ?xed windows to promote comparability 
between the two methods. 
The RMSE of the amplitude predictions is presented in Table 3 for both the newly 
proposed method and benchmarking algorithm. Corresponding mean execution times 
required for each prediction are presented in Table 4. While the traditional method 
produces improved accuracy bounded by a factor of 2 across the various durations 
considered, the newly proposed method reduces execution time by three orders of 
magnitude for the computational work?ow (i.e. algorithm and architecture parameters) 
used in this analysis. Furthermore, for both methods, inclusion of a larger portion of the 
saccade duration within the prediction provides either limited or no marginal improve- 
ment in prediction accuracy. For the newly proposed method, the reduction in accuracy 
observed for expanding window duration from D/4 to D/2 may be associated with 
reduction in the diversity of the considered feature set (for example, in the limiting sense 
where the duration includes the pro?le peak, the maximum velocity feature should be 
nearly identical as suggested by the main sequence relationship for the constant step 
stimulus used in data generation). 
Table 3. Comparative RMSE accuracy 
Window duration RMSE 
( 
Ei 
) 
, 
New method (°) 
RMSE 
( 
Ei 
) 
, 
Traditional method (°) 
W = D/8 1.6917 0.9758 
W = D/4 1.5169 0.9624 
W = D/2 1.7006 0.9408 
Table 4. Comparative mean execution times 
Window duration 
Mean(Exec.Time) 
New method (s) 
Mean(Exec.Time) 
Traditional method (s) 
W = D/8 19.1 
*/ 
10-6 32.2 
*/ 
10-3 
W = D/4 18.3 
*/ 
10-6 
29.1 * 10-3 
W = D/2 17.1 
*/ 
10-6 
29.5 * 10-3 
88 H. Gri?th et al.
Preliminary investigation has been conducted attempting to identify common 
sources of error in amplitude estimates for the newly proposed method. Namely, manual 
investigation of the worst-case estimates across the data set has been performed, with 
initial analysis indicating that estimates are particularly corrupted for those velocity 
pro?les varying from the idea case (i.e. noisy pro?les whose dynamics are not consistent 
with the ideal scenario of a concave function). While additional pre-?ltering may be 
utilized to remove these results in subsequent analysis attempting to quantify the best-case 
performance of this proposed approach, options for best handling such noisy 
pro?les in practice are of primary concern in future research. 
5 Conclusions 
A novel method for reducing the latency of existing velocity pro?le-based SLPP algo- 
rithms is introduced and explored herein. Rather than performing the requisite ?tting 
process for determination of the characteristic parameter set in real-time, the proposed 
method uses linear SVM mappings relating simplistic third-order statistical features 
computed during ?xed duration windows at the saccade onset to both the pro?le’s char- 
acteristic parameter set and saccade duration. Models are developed o?ine based upon 
?tting conducted over the entire saccade duration. This proposed methodology o?ers 
the bene?t of producing physiologically-meaningful saccade landing position predic- 
tions without requiring the online solution of the underlying non-linear optimization 
problem mandated in determining the characteristic parameter set for the previously 
proposed scaled Gaussian pro?le. Benchmarking versus a slight variation of the previ- 
ously proposed technique demonstrated that although RMSE prediction accuracy was 
reduced on the order of a factor of 2 (corresponding to a RMSE percent accuracy reduc- 
tion of 2.25% computed for the ideal step stimulus amplitude), requisite execution time 
is reduced by 3 orders of magnitude for the computational work?ow considered herein. 
For all cases considered, increasing window duration provided limited to no marginal 
improvement in prediction accuracy. This latter result is promising for enhancing 
prediction speed in practical implementations. 
While the reported results are encouraging, their generalizability is inherently limited 
by the level of pre-?ltering that was performed to yield empirical pro?les resembling 
the produced model functions. This analysis approach was chosen to establish a perform- 
ance baseline for the highest possible data quality conditions. Future research will 
establish the performance of various saccade prediction methods in cases of varied data 
quality, and for a more diverse set of amplitude values and directions (i.e. vertical and 
oblique saccades). Furthermore, subsequent work will attempt to optimize the general 
work?ow introduced herein through application of standard best-practices in regression 
approximation, including utilization of traditional feature selection algorithms, consid- 
eration of alternative regression models and optimization of associated hyperparameters, 
along with the consideration of alternative velocity pro?les suitable for modeling a 
broader range of trajectories encountered in practice, such as the skewed model pro?le 
based upon the Wald distribution recently proposed [25]. This latter modi?cation is 
Towards Reduced Latency in Saccade Landing Position Prediction 89
especially promising for predicting the known skewed velocity pro?les of large 
amplitude saccades. 
References 
1. Padmanaban, N., Konrad, R., Stramer, T., Cooper, E.A., Wetzstein, G.: Optimizing virtual 
reality for all users through gaze-contingent and adaptive focus displays. In: Proceedings of 
the National Academy of Sciences, p. 201617251 (2017) 
2. Albert, R., Patney, A., Luebke, D., Kim, J.: Latency requirements for foveated rendering in 
virtual reality. ACM Trans. Appl. Percept. 14(4), 25 (2017) 
3. Arabadzhiyska, E., Tursun, O.T., Myszkowski, K., Seidel, H.-P., Didyk, P.: Saccade landing 
position prediction for gaze-contingent rendering. ACM Trans. Gr. 36(4), 50 (2017) 
4. Wang, S., Woods, R.L., Costela, F.M., Luo, G.: Dynamic gaze-position prediction of saccadic 
eye movements using a Taylor series. J. Vis. 17(14), 3 (2017) 
5. Han, P., Saunders, D.R., Woods, R.L., Luo, G.: Trajectory prediction of saccadic eye 
movements using a compressed exponential model. J. Vis. 13(8), 27 (2013) 
6. Bahill, A.T., Clark, M.R., Stark, L.: The main sequence, a tool for studying human eye 
movements. Math. Biosci. 24(3–4), 191–204 (1975) 
7. Paeye, C., Schütz, A.C., Gegenfurtner, K.R.: Visual reinforcement shapes eye movements in 
visual search. J. Vis. 16(10), 15 (2016) 
8. Rayner, K.: Eye movements in reading and information processing: 20 years of research. 
Psychol. Bull. 124(3), 372 (1998) 
9. Wedel, M., Pieters, R.: A review of eye-tracking research in marketing, pp. 123–147. Emerald 
Group Publishing Limited (2008) 
10. Bednarik, R., Kinnunen, T., Mihaila, A., Fränti, P.: Eye-movements as a biometric, pp. 780– 
789 (2005) 
11. Patney, A., et al.: Towards foveated rendering for gaze-tracked virtual reality. ACM Trans. 
Graph. 35(6), 179 (2016) 
12. Banks, M.S., Sekuler, A.B., Anderson, S.J.: Peripheral spatial vision: limits imposed by 
optics, photoreceptors, and receptor pooling. J. Opt. Soc. Am. A 8(11), 1775 (1991) 
13. Rayner, K.: The gaze-contingent moving window in reading: development and review. Vis. 
Cognit. 22(3–4), 242–258 (2014) 
14. Nuthmann, A.: How do the regions of the visual ?eld contribute to object search in real-world 
scenes? Evidence from eye movements. J. Exp. Psychol. Hum. Percept. Perform. 40(1), 342 
(2014) 
15. Prince, S.J., Rogers, B.J.: Sensitivity to disparity corrugations in peripheral vision. Vis. Res. 
38(17), 2533–2537 (1998) 
16. Duchowski, A.T., Bate, D., Stringfellow, P., Thakur, K., Melloy, B.J., Gramopadhye, A.K.: 
On spatiochromatic visual sensitivity and peripheral color LOD management. ACM Trans. 
Appl. Percept. 6(2), 9 (2009) 
17. Saunders, D.R., Woods, R.L.: Direct measurement of the system latency of gaze-contingent 
displays. Behav. Res. Methods 46(2), 439–447 (2014) 
18. Diamond, M.R., Ross, J., Morrone, M.C.: Extraretinal control of saccadic suppression. J. 
Neurosci. 20(9), 3449–3455 (2000) 
19. Mathôt, S., Melmi, J.-B., Castet, E.: Intrasaccadic perception triggers pupillary constriction. 
PeerJ 3, e1150 (2015) 
20. Anliker, J: Eye movements: online measurement, analysis, and control. In: Eye Movements 
and Psychological Processes (1976) 
90 H. Gri?th et al.
21. Salvucci, D.D., Goldberg, J.H.: Identifying ?xations and saccades in eye-tracking protocols, 
pp. 71–78 (2000) 
22. Bahill, A.T., Latimer, J.R., Troost, B.T.: Linear homeomorphic model for human movement. 
IEEE Trans. Biomed. Eng. 11, 631–639 (1980) 
23. Holmqvist, K., Nyström, M., Mulvey, F.: Eye tracker data quality: what it is and how to 
measure it, pp. 45–52 (2012) 
24. Friedman, L, Rigas, I, Abdulin, E, Komogortsev, O.V.: A novel evaluation of two related and 
two independent algorithms for eye movement classi?cation during reading. Behav. Res. 
Methods (2018) 
25. Gri?th, H., Biswas, S., Komogortsev, O.V.: Towards improved saccade landing position 
estimation using velocity pro?le methods. In: IEEE SoutheastCon 2018, St. Petersburg FL 
(2018) 
Towards Reduced Latency in Saccade Landing Position Prediction 91
Wireless Power Transfer Solutions for ‘Things’ 
in the Internet of Things 
Tim Helgesen(?) 
and Moutaz Haddara 
Westerdals – Oslo School of Arts, Communication and Technology, Oslo, Norway 
Timrobbyh@gmail.com, Hadmoa@westerdals.no 
Abstract. The Internet of Things (IoT) has several applications in various indus- 
tries and contexts. During the last decade, IoT technologies were mainly domi- 
nated by the supply chains and warehouses of large manufacturers and retailers. 
Recently, IoT technologies have been adopted in virtually all other ?elds, 
including healthcare, smart cities, and self-driving cars. While the opportunities 
for IoT applications are endless, challenges do exist. These challenges can be 
broadly classi?ed as social, political, organizational, privacy, security, environ- 
mental, and technological challenges. In this paper, we focus on one dimension 
of the technological challenges, speci?cally on how IoT products/devices can be 
powered and charged without interruption, while either in use or in motion, since 
they are known to be intensively power consuming objects. This literature review 
paper explores how the emerging technology of Wireless Power Transfer (WPT) 
could aid in solving power and charging problems for various IoT devices. Our 
?ndings suggest that in theory, WPT can indeed be used to solve IoT’s intelligent 
devices, or “things”, charging and power challenges. However, we found that 
human exposure and safety, industrial context, environmental issues, and cost of 
technology are important factors that could a?ect WPT adoption in organizations. 
Keywords: Wireless power transfer · Internet of Things 
Wireless energy transfer · Literature review 
1 Introduction 
The Internet of Things (IoT) domain has increased in popularity and research focus in 
recent years, and is sometimes even described as the next big thing, much like the internet 
back in its early days [1]. IoT can be broadly described as a cyber-physical network 
where “smart” objects, or “things”, communicate and cooperate with each other (and 
with humans) to create new applications, or services to achieve a common goal [2, 3]. 
There are several formal de?nitions of IoT, and Vermesan et al. [4] proposed an ideal 
one: 
“The Internet of Things could allow people and things to be connected anytime, anyplace, with 
anything and anyone, ideally using any path/network and any service.” [4, p. 12]. 
Through this connection of people and things, the goal is to achieve a better world 
where things know what we like, what we want, and give them to us with minimal human 
intervention [5]. Yet some simply describe IoT as increased machine-machine 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 92–103, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_8
communication. However, as pointed out by Isenberg et al. [6], at its core, the internet 
of things is more than just communication technologies, it goes beyond communication; 
as it endows the individual object or “thing” with intelligence. These intelligence 
equipped objects, or “Smart/Intelligent Products” can be described as physical objects 
equipped or coupled with computational software [6, 7]. Wong et al. [8] proposed several 
requirements for intelligent products: (1) The object should have a unique identi?cation, 
(2) be able to communicate with its environment such as other objects, (3) retain or store 
data about itself, (4) deploy a language to display its features, production requirements 
etc., (5) and participate or make decisions relevant to its own destiny. These criteria 
must also be met to enable the interaction between things [6]. One of the main challenges 
related to these intelligent products is their power consumption; the power they need to 
be able to perform their functions normally [6, 7]. These functions could include, 
communication through wireless technology, or use of sensors [6]. Power is limited, 
because these objects often move around, and therefore need a self-su?cient energy 
source, such as batteries, to power mentioned functions [9]. Power consumption is a 
challenge that could a?ect the decision of which wireless technology can be used and 
adopted, because of the potential latency in the communication [9], or could be a poten- 
tial performance bottleneck [10]. Another problem is that battery replacement could be 
costly, especially in large-scale deployments and IoT infrastructures [11]. Following the 
replacement of batteries, throwing away batteries adds to the ever-increasing electronic 
waste issue. One solution to the power consumption problem is “clustering”, as proposed 
by López et al. [7]. Clustering gives the possibility to manage the power of the devices 
by electing so-called representative network “members”, which have the responsibility 
to collect and forward all communication within the network. These members are elected 
based on their residual energy, where devices under a pre-set percentage of energy will 
not be elected. However, this solution only slows battery consumption, as battery charge 
or replacement is still needed at some point in time. Another potential solution is the 
use of Bluetooth low-energy (BTLE) technology, which allows greater battery e?ciency 
compared to other communication technologies [12]. But this solution, again, only slows 
the inevitable, which is the replacement of batteries. 
The remainder of the paper is structured as follows. First, an overview of wireless 
power transfer technology opportunities and challenges are provided in Sect. 2. The 
research methodology is discussed in Sect. 3, followed by an overview of the articles 
named in Sect. 4. Section 5 provides an overview of the literature review’s main ?ndings. 
A discussion is provided in Sect. 6. Finally, research conclusions are provided in Sect. 7. 
2 Wireless Power Transfer 
Wireless power transfer (WPT) technology (see Fig. 1), is a technique also known as 
wireless charging, or wireless energy transfer (WET) [13]. WPT can be brie?y explained 
as the process of transmitting electricity from one power system to another through the 
air gap via, for instance, an electromagnetic ?eld or electromagnetic radiation [10]. 
Wireless charging happens when one of the transmitting systems is constantly powered, 
and therefore continues to transfer power until the other system/device is fully charged 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 93
[14]. The object, or power source, that transmits power is commonly referred to as a 
power source (e.g. charging station), and the object that receives the power is commonly 
referred to as the energy harvesting object, or simply “load” (e.g. robot) [15, 16]. 
Fig. 1. Generic wireless power transfer illustration. 
While this technology has the potential to completely reshape the IoT landscape, 
there is little research surrounding wireless power transfer in the IoT context. The aim 
of this study is to explore the current literature and identify the potential use and appli- 
cations of WPT technologies to wirelessly charge intelligent products, or things, and 
answer the following two main research questions: 
• What wireless power transfer technologies could potentially solve the power chal- 
lenges related to intelligent products? 
• What are the challenges following the use of wireless power transfer technologies? 
3 Methodology 
Literature review papers represent a well-established method for accumulating existing, 
documented, and state-of-the-art knowledge within a domain of interest. In this article 
we have applied a systematic review approach as described by Webster and Watson [17]. 
This approach is characterized by adopting explicit procedures and conditions. This 
involves the use of a variety of procedures combined with various search criteria to 
minimize bias as much as possible [18]. 
The review covers articles published between the years 2007–2018 (February). We 
have narrowed down the search process through a condition, that the articles need to be 
published in peer reviewed journals, edited books, or conference proceedings. More- 
over, no delimitation has been imposed on the outlets’ ?eld, to enable potential research 
results from various ?elds. The following search procedures have been applied to 
provide a comprehensive and systematic methodology. 
1. An initial search was done through Google Scholar. The search option was limited 
to articles’ titles. The keywords: wireless charging, wireless power transfer, wireless 
energy transfer, IoT, internet of things, and their combinations were used. 
2. Due to their high relevance for research, other research databases were used. These 
databases included ACM Digital Library, IEEE Xplore Digital Library, EBSCO host 
and Springer. The search procedure was restricted to the same keywords as in the 
94 T. Helgesen and M. Haddara
previous step. In addition to the title area, the abstract and keyword parts of the 
articles have been included into the search. 
3. In order to minimize the search results, we have put a constraint that the papers 
included in this review must have at least ?ve citations. 
4. Additionally, we conducted a secondary search through scanning all of the selected 
articles reference lists, to identify further potential literature sources. 
5. The articles abstracts were then carefully read by both authors to check their rele- 
vance for this review paper. Only articles directly addressing wireless power transfer 
technologies within the IoT domain were selected for it. 
6. Based on the preliminary review, two main categories of wireless transfer tech- 
nology ranges were identi?ed. Hence, the articles were classi?ed into two main 
groups, near-?eld and long-?eld power transfer technologies. 
The authors independently classi?ed the articles into a concept matrix [17], which 
included the research themes. The results were then compared and discussed in order to 
achieve a consensus on each article’s classi?cation. It is important to mention that an 
article could fall into one or more themes, based on the article’s technology focus. 
One of the main limitations of this research methodology is that some potentially 
relevant papers may have been omitted, because they didn’t meet our condition of the 
minimum number of citations. The omitted research papers that were more recent and 
had a low number of citations particularly a?ected the scope of this literature review. 
4 Overview of the Articles 
In total, we reviewed thirty articles that were published in various outlets; Of these, 24 
are journal articles, 1 is a conference proceeding, and 5 are articles in books. As seen in 
the following ?gure (Fig. 2), the review shows a gradual increase in research interest in 
wireless power transfer, with a maximum of 9 publications in 2016. 
1 
0 
1 
0 
2 2 
3 3 3 
9 
6 
0 
Fig. 2. Number of publications per year. 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 95
5 Main Findings 
In literature, several potential wireless power transfer technologies were identi?ed and 
split into two main categories: Near-?eld and Long-?eld wireless charging technologies, 
as shown in the following table and discussed in this section (Table 1). 
Table 1. Overview of research topics and their corresponding papers. 
Range category Wireless power technologies overview 
WPT technology Papers 
Near-?eld Inductive Power Transfer (IPT) [10, 15, 19–24] 
Resonant Inductive Power Transfer (RIPT) [10, 14–16, 25–32] 
Capacitive Power Transfer (CPT) [33, 34] 
Long-?eld Radio Frequency (RF) radiation [13, 35–38] 
Microwave Power Transfer (MPT) [10, 15, 38–40] 
Laser Power Transfer (LPT) [10, 41, 42] 
5.1 Near-Field Power Transfer 
(1) Inductive Power Transfer 
Inductive power transfer (IPT), also known as inductive coupling, transfers power 
from one coil to another, and have been used for powering RFID tags and medical 
implants [26]. The ?eld IPT generates is in the kilohertz range, and is typically used 
within a few millimeters, to a few centimeters (20 cm) from the targeted load [15]. 
Power varies between watt and kilowatt based on transmission e?ciency [33]. The 
transmission e?ciency decreases as range increases, and even more so if there is 
any misalignment between the coils [23, 25]. Following misalignment, if there is a 
change to the range, the coils require calibration to work [39]. Loss of electricity 
through misalignment, range, or metallic objects between the coils will lead to an 
increase in heat [14, 15]. Due to its low transmission e?ciency, the ?eld is consid- 
ered safe for humans [15]. In the IoT domain, this technology has been recom- 
mended for several applications. For example, Rim and Mi [24] explored the possi- 
bilities of the wireless power transfer to electric vehicles, and other mobile devices. 
(2) Resonant Inductive Power Transfer 
One of the earliest implementations of the resonant inductive power transfer (RIPT) 
is Nikolai Tesla’s magnifying transmitter, or coil [43] (Fig. 3). The magnifying 
transmitter succeeded to wirelessly transmit power to power harvesting objects, 
like lamps, as shown in Fig. 2. Resonant inductive power transfer follows the same 
basic principles as IPT. However, this technology makes use of magnetic resonant 
coils, which operate at the same resonance frequency [10]. This technique makes 
creates a stronger connection, and therefore increases the potential range and e?- 
ciency. The ?rst documented optimal use of RIPT for WPT was performed by Kurs 
et al. [28], and achieved a transmission e?ciency around 90% at 1 m, and 40% at 
2 m. Power varies between watt and kilowatt based on transmission e?ciency [34]. 
As with IPT, the transmission e?ciency decreases as range increases, though RIPT 
96 T. Helgesen and M. Haddara
has proven to have a longer range and better e?ciency [15, 20, 28]. As with IPT, 
RIPT requires calibration for each change made to distance or coil [39]. RIPT 
technology can charge multiple receivers at the same time, even if the receivers are 
out of sight [15]. As with ICT, the resonant ?eld is considered safe for humans, 
which was proven by Ding et al. [19]. Thus, Bito et al. [32] have developed a real-time 
electrically controlled wireless charging infrastructure, and algorithms that 
can be used to recharge biomedical and implanted devices (e.g. pacemakers). This 
could e?ectively abolish the need for surgical procedures, which are currently 
necessary for occasional battery replacement. 
Fig. 3. Tesla’s magnifying transmitter wirelessly powering a lamp [44]. 
(3) Capacitive Power Transfer 
Capacitive power transfer (CPT) is a coupling made up of two metal surfaces where 
electricity is transferred between the point of contact [33]. Though potentially 
cheaper than IPT and RIPT, CPT requires close contact between the two metal 
surfaces. Hence, it is greatly limited by range requirements [27, 33, 34]. CPT tech- 
nology has only recently seen kilowatt-scale loads, and was overlooked until 2008, 
which could explain this [33]. 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 97
5.2 Long-Field Power Transfer 
(1) Radio Frequency Radiation 
Radio Frequency (RF) radiation, uses radio frequency emitted from an antenna for 
carrying radiant energy [10]. It can send power from a meter, up to several kilo- 
meters based on the technique used [15]. However, it has a very low e?ciency rate, 
and requires line of sight to deliver power [29]. In regard to the low e?ciency rate, 
one project reported a transmission e?ciency of around 1% at 30 cm [10]. It also 
needs to know the location of the intended target [15]. Due to its health risks through 
exposure, radio frequency is commonly used and operated in low power areas [15]. 
Boshkovska et al. [35] proposed a simultaneous wireless information and power 
transfer (SWIPT) model that enables simultaneous wireless information and power 
transfer on the same waveforms. This model also extends the possibilities for IoT 
energy-harvesting devices, which also need continuous communication [35, 40]. 
One of the paramount obstacles for far-?eld wireless power implementations, is the 
end-to-end power transfer e?ciency and optimization needed to increase the direct 
current power level at the output of the rectenna (energy harvester), without the 
need to increase the transmission power and waveform output [36]. Through simu- 
lations, Clerckx and Bayguzina [36, 37] and Huang and Clerckx [45], have provided 
models and algorithms that could potentially increase the transmission output in 
waveforms, and decrease power loss during radio frequency to direct current 
conversions in far-?eld transmissions. 
(2) Microwave Power Transfer 
Microwave power transfer (MPT) is a technique that increases transmission e?- 
ciency and range through, for instance, a parabolic dish, which focuses the radio 
waves [14, 22]. However, MPT requires complicated tracking mechanisms, and a 
large scale of devices [15]. Galinina et al. [22], have proposed a framework for 
applying MPT techniques to transfer power to 5G devices, such as wearables, 
through beacons that facilitate a continuous supply of power, creating self-sustain- 
able devices. Finally, Di Renzo and Lu [38] developed a stochastic mathematical 
model to analyze and optimize low-energy cellular-enabled mobile devices, that 
have dual wireless information and beam power transmission capabilities. 
(3) Laser Power Transfer 
Another RF technique is the use of optical laser power transfer (LPT), which trans- 
mits power under visible or near infrared frequency [10]. However, like MPT, it 
requires complicated tracking mechanisms, and a large spectrum of devices [15]. 
One of the potential applications of LPT is Industry 4.0, otherwise known as the 
4th industrial revolution [2]. On a larger scale, with the emergence of cloud 
computing and the current advancements in the mobile networks, billions of heter- 
ogeneous smart devices with di?erent application requirements are connected to 
networks, and are currently generating large volumes of data that need be processed 
in distributed cloud infrastructures [42]. Hence, Munoz et al. [42] have presented 
a platform that is currently under development, which utilizes ?fth-generation (5G) 
mobile network technologies to develop new radio interfaces to cope with the 
exponential tra?c growth, and integrate diverse networks from end to end, with 
98 T. Helgesen and M. Haddara
distributed cloud resources to deliver E2E IoT and mobile services. Moreover, a 
paper by Liu et al. [41], explored the possibilities of transforming the current 
Chinese power grid into a smart grid to enable IoT applications. The paper focuses 
on optical/laser technologies as enablers for IoT devices’ communication and wire- 
less charging through the grid. 
6 Discussion 
The reviewed articles are spread across 20 various outlets. Among the outlets, we have 
recognized only one special journal issue focusing on wireless power transfer technol- 
ogies within the IoT context. As the research interest on WPT in IoT is increasing, 
research outlets should pay more attention to this domain. In general, 30 articles across 
a 12-year period is a low number of publications. Despite the need for research WPT 
for IoT was recognized in previous literature. Still, the amount of research conducted 
on this issue is considered very limited. Thus, more research needs to be carried out in 
order to gather su?cient knowledge about this phenomenon, as WPT in IoT did not 
receive appropriate attention compared to other IoT related topics. 
Based on our WPT in IoT literature review, in the following part we answer our 
research questions, and present some research gaps and future research suggestions. 
To answer the ?rst question, what wireless power transfer technologies could poten- 
tially solve the power challenges related to intelligent products? It is apparent that 
virtually all of the technologies identi?ed in the literature could solve the device charging 
and power harvesting challenges that were discussed earlier in this paper. However, the 
decision of which of these technologies would be the best ?t, should be based on several 
factors. One factor is the target environment. For instance, one type of environment 
could be an industrial workplace, where intelligent devices are being used to inform 
users about exposure to hazardous equipment such as in the case of Kortuem et al. [46]. 
Since this would most likely be a very open and dynamic environment, microwave 
power transfer could be used through the use of power beacons (PBs), as recommended 
by Huang and Lau [39]. Likewise, the use of a capacitive power transfer solution is also 
viable, where the smart object has to be placed on top of a charging platform when at 
rest; though this would require the device to hold out until it is charged. This technology 
is very similar to existing wireless mobile phone charging stations. The decision of 
which solution would be the best ?t, should take into consideration another factor—cost. 
Though costs might be reduced, due to the absence of battery that would need to be 
replaced, the high implementation costs of the technology are still factors to be consid- 
ered. Implementation costs could be, for example; the price of changing traditional 
charging cords with wireless chargers, and the cost of installing wireless power receivers 
in the intelligent products [15], though this would depend on the chosen technology. 
Cost is also a?ected by the required charging range, as long-range charging is not as 
e?ective as wired; therefore, consumes more electricity. Another factor is the size of the 
object/device. It has been pointed out that both inductive and resonant inductive coupling 
require a relatively large receiver for e?ective long-range charging [15]. Though this 
most likely depends on the amount of power the device needs, as Cannon et al. [26] 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 99
pointed out, how one large-source coil transponder can be used to charge many small-load 
coil receivers. However, the most important factor should be the planned perform- 
ance level of the smart object, as on-the-go charging leads to more power consumption; 
therefore, opening the way for more functions. The goal should be to utilize this extra 
power to increase the performance of the smart object. To illustrate this, clustering as 
explained in the introduction, was proposed to slow power consumption at the cost of 
real-time data and could lead to potentially disconnected environments. However, 
always having the power needed to perform their function would lead to always-avail- 
able real-time data, communication, and coordination, which is closer to the ideal de?- 
nition of IoT. 
Regarding the second research question: What are the challenges following the use 
of wireless power transfer technologies? There are some general challenges with the 
use of WPT technology. One of the paramount challenges is how businesses could 
outweigh the cost of acquiring and using the technology in terms of business value. 
Another challenge is to implement the technology in an optimum way, so that it does 
not disrupt or slow down business processes. In addition, the technology must be imple- 
mented in a way that it will not pose any potential health risks to humans in the vicinity. 
Based on this review’s ?ndings, several research gaps have been identi?ed. For 
example, it is evident that the majority of the reviewed papers focused more on the near-?eld 
wireless power transfer technologies than the long-?eld context. As discussed 
earlier, the longer the range, the more wireless power is needed to charge distant objects, 
which could be ine?cient and costly for the time being. Thus, more research is needed 
in order to ?nd power optimization techniques among available power sources and 
power harvesting devices. It is also palpable that very little research has been conducted 
within the laser power transfer domain in the long-?eld WPT. This lack of research could 
possibly be explained by the expensive infrastructure required to implement this tech- 
nology. In addition, as virtually all of the papers reviewed are considered highly tech- 
nical papers (mostly IEEE outlets), there is also an apparent research gap on the business 
value and feasibility of the di?erent WPT technologies from a business perspective. 
Furthermore, almost none of the papers have reported a real-world case study on WPT 
implementations within businesses. This could explain the slow adoption of WPT tech- 
nology in this particular domain, as bridging research between technical and business 
issues is needed to reach the managers, and to increase the businesses’ awareness of 
such technologies. 
7 Conclusions 
This paper contributes to both research and practice through providing a comprehensive 
literature review on the potential of wireless power transfer technologies in the IoT 
domain. For practice, the paper sheds the light on past and recent issues as well as 
challenges that can guide IoT consultants, vendors, and clients in their future projects. 
For researchers, the organization of literature into the di?erent WPT technologies can 
aid them in identifying the topics, ?ndings, and gaps discussed in each technology of 
100 T. Helgesen and M. Haddara
interest. Finally, we have provided our observations and future research suggestions that 
would enrich knowledge in this domain. 
References 
1. Sajid, O., Haddara, M.: NFC mobile payments: are we ready for them? In: SAI Computing 
Conference (SAI), 2016, pp. 960–967 (2016) 
2. Haddara, M., Elragal, A.: The readiness of ERP systems for the factory of the future. Procedia 
Comput. Sci. 64, 721–728 (2015) 
3. Misra, G., Kumar, V., Agarwal, A., Agarwal, K.: Internet of Things (IoT)—a technological 
analysis and survey on vision, concepts, challenges, innovation directions, technologies, and 
applications (an upcoming or future generation computer communication system 
technology). Am. J. Electr. Electron. Eng. 4, 23–32 (2016) 
4. Vermesan, O., Friess, P., Guillemin, P., Gusmeroli, S., Sundmaeker, H., Bassi, A., et al.: 
Internet of Things strategic research roadmap. In: Internet of Things-Global Technological 
and Societal Trends, vol. 1, pp. 9–52 (2011) 
5. Perera, C., Zaslavsky, A., Christen, P., Georgakopoulos, D.: Context aware computing for 
the Internet of Things: a survey. IEEE Commun. Surv. Tutor. 16, 414–454 (2014) 
6. Isenberg, M.-A., Werthmann, D., Morales-Kluge, E., Scholz-Reiter, B.: The role of the 
Internet of Things for increased autonomy and agility in collaborative production 
environments. In: Uckelmann, D., Harrison, M., Michahelles, F. (eds.) Architecting the 
Internet of Things, pp. 195–228. Springer, Berlin (2011) 
7. López, T.S., Brintrup, A., Isenberg, M.-A., Mansfeld, J.: Resource management in the Internet 
of Things: clustering, synchronisation and software agents. In: Uckelmann, D., Harrison, M., 
Michahelles, F. (eds.) Architecting the Internet of Things, pp. 159–193. Springer, Berlin 
(2011) 
8. Wong, Y., McFarlane, D., Zaharudin, A.A., Agarwal, V.: The intelligent product driven 
supply chain. In: 2002 IEEE International Conference on Systems, Man and Cybernetics, vol. 
4, p. 6 (2002) 
9. Mattern, F., Floerkemeier, C.: From the internet of computers to the Internet of Things. In: 
Sachs, K., Petrov, I., Guerrero, P. (eds.) From Active Data Management to Event-Based 
Systems and More, pp. 242–259. Springer, Berlin (2010) 
10. Xie, L., Shi, Y., Hou, Y.T., Lou, A.: Wireless power transfer and applications to sensor 
networks. IEEE Wirel. Commun. 20, 140–145 (2013) 
11. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of Things: vision, applications 
and research challenges. Ad Hoc Netw. 10, 1497–1516 (2012) 
12. Swan, M.: Sensor mania! the Internet of Things, wearable computing, objective metrics, and 
the quanti?ed self 2.0. J. Sens. Actuator Netw. 1, 217–253 (2012) 
13. Yuan, F., Jin, S., Wong, K.K., Zhao, J., Zhu, H.: Wireless information and power transfer 
design for energy cooperation distributed antenna systems. IEEE Access 5, 8094–8105 (2017) 
14. Chawla, N., Tosunoglu, S.: State of the art in inductive charging for electronic appliances and 
its future in transportation. In: 2012 Florida Conference on Recent Advances in Robotics, pp. 
1–7 (2012) 
15. Lu, X., Wang, P., Niyato, D., Kim, D.I., Han, Z.: Wireless charging technologies: 
fundamentals, standards, and network applications. IEEE Commun. Surv. Tutor. 18, 1413– 
1452 (2016) 
16. Lu, X., Wang, P., Niyato, D., Han, Z.: Resource allocation in wireless networks with RF 
energy harvesting and transfer. IEEE Netw. 29, 68–75 (2015) 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 101
17. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature 
review. MIS Q. 26, xiii–xxiii (2002) 
18. Bryman, A.: Social Research Methods. OUP, Oxford (2012) 
19. Ding, P.-P., Bernard, L., Pichon, L., Razek, A.: Evaluation of electromagnetic fields in human 
body exposed to wireless inductive charging system. IEEE Trans. Magn. 50, 1037–1040 (2014) 
20. Hui, S.Y.R., Zhong, W., Lee, C.K.: A critical review of recent progress in mid-range wireless 
power transfer. IEEE Trans. Power Electron. 29, 4500–4511 (2014) 
21. Zhao, B., Kuo, N.-C., Niknejad, A.M.: An inductive-coupling blocker rejection technique for 
miniature RFID tag. IEEE Trans. Circuits Syst. I Regul. Pap. 63, 1305–1315 (2016) 
22. Galinina, O., Tabassum, H., Mikhaylov, K., Andreev, S., Hossain, E., Koucheryavy, Y.: On 
feasibility of 5G-grade dedicated RF charging technology for wireless-powered wearables. 
IEEE Wirel. Commun. 23, 28–37 (2016) 
23. Imura, T., Hori, Y.: Maximizing air gap and e?ciency of magnetic resonant coupling for 
wireless power transfer using equivalent circuit and Neumann formula. IEEE Trans. Ind. 
Electron. 58, 4746–4752 (2011) 
24. Rim, C.T., Mi, C.: Wireless Power Transfer for Electric Vehicles and Mobile Devices. Wiley, 
Hoboken (2017) 
25. Beh, T.C., Kato, M., Imura, T., Oh, S., Hori, Y.: Automated impedance matching system for 
robust wireless power transfer via magnetic resonance coupling. IEEE Trans. Ind. Electron. 
60, 3689–3698 (2013) 
26. Cannon, B.L., Hoburg, J.F., Stancil, D.D., Goldstein, S.C.: Magnetic resonant coupling as a 
potential means for wireless power transfer to multiple small receivers. IEEE Trans. Power 
Electron. 24, 1819–1825 (2009) 
27. Hui, S.: Planar wireless charging technology for portable electronic products and Qi. Proc. 
IEEE 101, 1290–1301 (2013) 
28. Kurs, A., Karalis, A., Mo?att, R., Joannopoulos, J.D., Fisher, P., Soljacic, M.: Wireless power 
transfer via strongly coupled magnetic resonances. Science 317, 83–86 (2007) 
29. Xie, L., Shi, Y., Hou, Y.T., Sherali, H.D.: Making sensor networks immortal: an energy-renewal 
approach with wireless power transfer. IEEE/ACM Trans. Netw. 20, 1748–1761 (2012) 
30. Choi, B.H., Thai, V.X., Lee, E.S., Kim, J.H., Rim, C.T.: Dipole-coil-based wide-range 
inductive power transfer systems for wireless sensors. IEEE Trans. Ind. Electron. 63, 3158– 
3167 (2016) 
31. Yeo, T.D., Kwon, D., Khang, S.T., Yu, J.W.: Design of maximum e?ciency tracking control 
scheme for closed-loop wireless power charging system employing series resonant tank. IEEE 
Trans. Power Electron. 32, 471–478 (2017) 
32. Bito, J., Jeong, S., Tentzeris, M.M.: A real-time electrically controlled active matching circuit 
utilizing genetic algorithms for wireless power transfer to biomedical implants. IEEE Trans. 
Microw. Theory Tech. 64, 365–374 (2016) 
33. Dai, J., Ludois, D.C.: A survey of wireless power transfer and a critical comparison of 
inductive and capacitive coupling for small gap applications. IEEE Trans. Power Electron. 
30, 6017–6029 (2015) 
34. Dai, J., Ludois, D.C.: Wireless electric vehicle charging via capacitive power transfer through 
a conformal bumper. In: 2015 IEEE Applied Power Electronics Conference and Exposition 
(APEC), pp. 3307–3313 (2015) 
35. Boshkovska, E., Koelpin, A., Ng, D.W.K., Zlatanov, N., Schober, R.: Robust beamforming 
for SWIPT systems with non-linear energy harvesting model. In: 2016 IEEE 17th 
International Workshop on Signal Processing Advances in Wireless Communications 
(SPAWC), pp. 1–5 (2016) 
102 T. Helgesen and M. Haddara
36. Clerckx, B., Bayguzina, E.: Waveform design for wireless power transfer. IEEE Trans. Signal 
Process. 64, 6313–6328 (2016) 
37. Clerckx, B., Bayguzina, E.: Low-complexity adaptive multisine waveform design for wireless 
power transfer. IEEE Antennas Wirel. Propag. Lett. 16, 2207–2210 (2017) 
38. Renzo, M.D., Lu, W.: System-level analysis and optimization of cellular networks with 
simultaneous wireless information and power transfer: stochastic geometry modeling. IEEE 
Trans. Veh. Technol. 66, 2251–2275 (2017) 
39. Huang, K., Lau, V.K.: Enabling wireless power transfer in cellular networks: architecture, 
modeling and deployment. IEEE Trans. Wirel. Commun. 13, 902–912 (2014) 
40. Bi, S., Zeng, Y., Zhang, R.: Wireless powered communication networks: an overview. IEEE 
Wirel. Commun. 23, 10–18 (2016) 
41. Liu, J., Li, X., Chen, X., Zhen, Y., Zeng, L.: Applications of Internet of Things on smart grid 
in China. In: 2011 13th International Conference on Advanced Communication Technology 
(ICACT), pp. 13–17 (2011) 
42. Munoz, R., Mangues-Bafalluy, J., Vilalta, R., Verikoukis, C., Alonso-Zarate, J., Bartzoudis, 
N., et al.: The CTTC 5G end-to-end experimental platform: integrating heterogeneous 
wireless/optical networks, distributed cloud, and IoT devices. IEEE Veh. Technol. Mag. 11, 
50–63 (2016) 
43. Brown, W.C.: The history of power transmission by radio waves. IEEE Trans. Microw. 
Theory Tech. 32, 1230–1242 (1984) 
44. Tesla, N.: The Problem of Increasing Human Energy: With Special Reference to the 
Harnessing of the Sun’s Energy. Cosimo Inc., New York (2008) 
45. Huang, Y., Clerckx, B.: Waveform optimization for large-scale multi-antenna multi-sine 
wireless power transfer. In: 2016 IEEE 17th International Workshop on Signal Processing 
Advances in Wireless Communications (SPAWC), pp. 1–5 (2016) 
46. Kortuem, G., Kawsar, F., Sundramoorthy, V., Fitton, D.: Smart objects as building blocks 
for the Internet of Things. IEEE Internet Comput. 14, 44–51 (2010) 
Wireless Power Transfer Solutions for ‘Things’ in the IoT 103
Electronic Kintsugi 
An Investigation of Everyday Crafted Objects in Tangible 
Interaction Design 
Vanessa Julia Carpenter1(?) , Amanda Willis2 , Nikolaj “Dzl” Møbius3 , 
and Dan Overholt1 
1 
Technical Doctoral School of IT and Design, Aalborg University, Copenhagen, Denmark 
{vjc,dano}@create.aau.dk 
2 
Simon Fraser University, Surrey, Canada 
3 
HumTek, Roskilde University, Roskilde, Denmark 
Abstract. In the development of enhanced and smart technology, we explore 
the concept of meaningfulness, tangible design and interaction with everyday 
objects through Kintsugi, the Japanese craft of repairing broken ceramics with 
gold. Through two workshops, this emergent design research develops an iterative 
prototype: Electronic Kintsugi, which explores how we can facilitate more 
human-to-human or human-to-self connection through a hybrid crafted everyday 
object. We identify three themes: (1) enhancing human connection through 
embedded or “magic” technology; (2) using everyday objects to prompt personal 
re?ection and development; and (3) exploring transferable design principles of 
smart products with a device of unde?ned purpose, and this converges traditional 
craft and technology. 
Keywords: Craft · Internet of Things (IoT) · Tangible interaction 
Everyday objects 
1 Introduction 
This work explores Kintsugi, the Japanese craft of repairing broken ceramics with gold 
and explores how we can use capacitive touch to facilitate tangible interaction with an 
everyday, crafted object. We situate ourselves within interaction design and look to craft 
and tangible interaction related works. 
The grounding question for this work asks how can we facilitate more human-to-human 
or human-to-self connection through a digital/crafted hybrid-everyday object 
and which design bene?ts can this o?er future technology? We explore this through 
three themes which emerge in our work about technology, craft and interaction. Much 
of the recent work within interaction design about tangible interaction has shown an 
increased focus on traditional craft work [1–4] and a return to tangible interaction [5– 
7] from screen interaction. Despite a focus on the craft and the tangible, in commercial 
areas a strong focus on app-based interaction, digital displays, and screen based solutions 
has become the norm, even pushing towards virtual or augmented reality. Meanwhile, 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 104–121, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_9
a number of critical views about the value of the Internet of Things (IoT) have recently 
been published [3, 8] and a wave of research and devices around the themes of mind- 
fulness, self-exploration, re?ection, and well-being is emerging [9, 10]. 
In this area of overlap, between screens and tangible interaction, between making 
devices and traditional craft, between the IoT devices and the mindfulness tools, we ?nd 
ourselves interested in exploring the potential engagement qualities of non-screen, 
tangible interaction in the form of everyday crafted objects. We are speci?cally inter- 
ested in the physical nature of both the IoT gadgets and the mindfulness tools as they 
tie into the physicality of crafted objects. We rely on physical objects in our lives and 
while designing future smart homes, o?ces, cars, etc., we might bene?t from a deeper 
understanding of how we relate to these physical things [11]. Núñez Pacheco and Loke 
elaborate: “A focus on a more re?ective approach can o?er fresh ways of understanding 
how the lived body interacts with artefacts, products and spaces” [12]. This speaks to 
how we can look further into understanding how humans can interact with ‘things’ and 
our focus is to take that further and ask how we can facilitate more human-to-human or 
human-to-self connection through a hybrid crafted everyday object. 
2 Introducing Kintsugi as a Device to Explore Connection and 
Meaning Making 
Electronic Kintsugi was developed as an investigation tool into how we could use 
everyday objects to explore human-to-human connection, human-to-self connection, 
and to ?nd if we could develop something which intrigued and engaged people, moving 
from the IoT (Internet of Things) towards an appreciation and use of crafted, tangible, 
interactive, everyday objects. Electronic Kintsugi is a platform for exploration and 
meaning-making, an opportunity to engage with others, and with oneself and to create 
new narratives. In our work, our context was Japan’s artisanal craft of Kintsugi where 
we developed our work with a Kintsugi artist and our focus was on the tangible, non-screen 
interaction properties of how a device with an unde?ned purpose might exist in 
between these realms of traditional craft, technology and sound. 
Inspired by Tsaknaki and Fernaeus’ work with Expanding on Wabi-Sabi as a Design 
Resource in HCI [13] where they explored un?nished craft and interaction design, the 
authors created a device and facilitated two participatory workshops exploring the Japa- 
nese craft of Kintsugi: mending broken ceramics with a precious metal to make them 
more beautiful and valuable than before. These concepts were adopted with the creation 
of Electronic Kintsugi: a sound or light reactive piece of repaired ceramics with touch 
interaction on the precious metal seams. Our interest is in the aesthetics of individuality, 
human touch, and to explore and respect the tradition of the craft of Kintsugi itself (Video 
of Electronic Kintsugi here: https://youtu.be/p5Pu0-gZ3u0) (Fig. 1). 
Electronic Kintsugi 105
Fig. 1. Electronic Kintsugi in a design expert’s home; The Kintsugi artist creating traces; First 
workshop explorations with light and sound. 
3 Related Works: Exploring the Physical Qualities of Hybrid 
Tangible Embedded Interaction, Through Crafted “Things” 
The literature review researched works where craft is referenced for the transferable 
physical qualities of interaction design; material, texture, touch, and recognition of 
craftsmanship as opposed to the sleek smooth, machined surfaces of our current smart 
products. We see this as a natural progression from a screen-based society, moving 
towards embodied engagement and beyond the swipe-interaction of the “black mirror” 
(screen) as described by Rose [14]. Three thematic ?ndings informed our prototype and 
workshop development. 
3.1 Traditional Craft as a Starting Point for Exploration 
Tsaknaki and Fernaeus explore craft in depth, in a variety of their works, and hereby 
evaluate the role of interaction design in craft. In their work on Wabi Sabi, Tsaknaki 
and Fernaeus [13] present the concept of Wabi Sabi; and “approach perfection through 
explicitly un?nished designs”. We embrace the concept of un?nished design with Elec- 
tronic Kintsugi, deliberately designing an un?nished device to prompt curiosity and 
exploration of the prototype. In their work with leather, Tsaknaki, Fernaeus, and Schaub 
[15] explore how leather can be a touch based, rich material for tangible interactions. 
This work informs how we can look to everyday materials, in our case, ceramics, for 
stroking interaction, much like the leather interactions of their SoundBox. 
In exploring silversmithing, Tsaknaki, Fernaeus, Rapp and Belenguer [16] both 
engaged local artisans and focused especially on the “cultural and historical signi?- 
cance” of the craft, and explored the design “space of screen-less” interactions. This 
?nding informed our choice of working with the Japanese artisanal craft of Kintsugi 
where we developed our work with a Kintsugi artist and our focus was on the tangible, 
non-screen interaction properties of how a device with an unde?ned purpose might exist 
in between these realms of traditional craft and technology. 
106 V. J. Carpenter
3.2 Designing from Everyday Things with Social Implications in Mind 
In recent works about the Internet of Things (IoT), Cila, Smit, Giaccardi and Kröse [8], 
Nordrum [17], and Lingel [3] all explore the social signi?cance of the “thing” and 
suggest that we need not only look at the everyday (home and workplace) but also the 
social and cultural implications of these everyday interactions with things. Our work 
focuses on this “thing” and thus, the development of Electronic Kintsugi. 
3.3 Technology and Touch 
Signi?cant work has been done in the ?eld of interaction design with regards to touch 
and in the interests of space we do not cover that here, however the particular work by 
Cranny-Francis [18] covers a sizeable portion of the touch research done within design. 
In Semefulness: a social semiotics of touch, Cranny-Francis introduces the experience 
of touch as ‘semefulness’ – “multiply signi?cant, physically, emotionally, intellectually, 
spiritually, politically” [18]. She describes the ‘tactile regime’ of touch in culture, how 
it shapes how we engage with one another or to the tools we design and then use. She 
describes that “Touch is semeful in that it is full of meanings - physical, emotional, 
intellectual, spiritual and those meanings are socially and culturally speci?c and 
located.” Here we can begin to touch upon the multi-faceted nature of Electronic Kint- 
sugi. It is culturally and location speci?c to traditional Japanese craft; it is emotional to 
some - as an heirloom or a piece of valuable art; it fosters social interaction when acting 
as Electronic Kintsugi (see Sect. VI. C); and it is physical in nature, it requires touch, 
stroking, holding the bowl. One ambition of Electronic Kintsugi is to enable meaningful 
experiences for the participants, and by addressing Cranny-Francis’ ‘semeful’ attributes, 
we may begin to explore this domain. 
3.4 A Focus on Audio and Playfulness 
Schoemann and Nitsche [4] use the “Stitch Sampler”, a sew-able musical instrument to 
focus on embodiment via the act of sewing, and on audio feedback, “to respond to the 
crafter’s personality”. These qualities of craft, tangible non-screen interaction, and 
playfulness with sound inform our process, helping to frame the area we are exploring. 
Electronic Kintsugi allows participants to explore the interaction qualities of a hybrid 
crafted device and consider its potential uses in their lives. We encourage curiosity and 
unexpected encounters, and re?ections of those encounters. This speaks to our objective 
to inform future smart product design and encourage a tangible, non-screen interface 
which utilizes craft and the qualities of curiosity and re?ectivity. 
4 Methodology 
Initially, we were fascinated by the idea of Kintsugi and made a basic prototype to 
explore possible values of Electronic Kintsugi. This work spans from the ?rst prototype 
to two workshops, one in Japan, and one in Denmark, six months apart. We present an 
Electronic Kintsugi 107
overview of methods here and then describe each workshop and the ?ndings in the 
following sections. 
4.1 Workshop 1: Methods 
The ?rst workshop was designed in a collaborative process with FabCafe Tokyo and 
Kintsugi artist, Kurosawa where we combined electronics with an everyday “craft” 
object with the artisan in this process [16] so they could both introduce us to the nuances 
of the craft and help us to understand to what we should be paying attention. 
Following the process described by Tsaknaki, Fernaeus and Schaub [15] in their 
leather material explorations, we created a workshop session to explore the properties 
of Kintsugi and gain insight into the craft, and to investigate how our prototype was 
received by participants in that context. 
We used thin strips of copper tape to conduct electrical current and worked with the 
Kintsugi artist to carefully overlay the traces of precious metals where the repair had 
been, to emulate the traditional Kintsugi.1 
The workshop consisted of two of the authors (one, an electrical and mechanical 
engineer and the other an interaction designer), the Kintsugi artist, and seven participants 
of varying electronics skill levels who were recruited through an open Fabcafé Tokyo 
Facebook event. 
During the workshop, the Kintsugi artist presented and demonstrated their process, 
allowing participants to try their hand at creating Kintsugi. The authors presented their 
work and the thoughts behind the Electronic Kintsugi. The workshop explored Kintsugi 
and interaction with it, using two familiar outputs, sound and light, which would act as 
examples of possible outputs, so that participants were able to extrapolate from this in 
terms of what the Electronic Kintsugi might be used for. 
We conducted the workshop in a focus group style, and did two rounds of explorative, 
hands-on evaluation. A questionnaire was developed to capture their experience (Results 
in section “First Workshop”). 
4.2 Second Iteration of the Electronic Kintsugi 
Cila, Smit, Giaccardi and Kröse [8] describe the interventionist product, for creating 
dialogues, which sense, respond to, and interpret data. The Electronic Kintsugi was 
developed to sense touch, responds to it, and for the second workshop, could interpret 
data, such as how often it is being stroked. 
After feedback from the ?rst workshop, the Electronic Kintsugi was updated to have 
more responsiveness and a more light interaction would emerge, or how it would 
progress in order to prompt explorative and playful behaviour with the device. Rather, 
it had a certain level of ambiguity [19] via the programmed adaptive behaviours, based 
on how much it was interacted with and for how long, e.g., if it had been left alone, or 
o? for a period. 
1 
http://www.kurovsya.com/. 
108 V. J. Carpenter
Several touch-to-sound and touch-to-light reactions were developed for the work- 
shop. Each reaction was taking input from the touch interface2 and creating a speci?c 
output in the form of either light or sound. Light was output on a strip of NeoPixels and 
sound was synthesized using a software library3 and output to a speaker. 
The light reactions transform a single parameter from the touch interface into a 
speci?c light pattern on the LED display. Likewise, sound reactions transform a single 
parameter from the touch interface to single tones, chords or evolving sound ?gures. 
In the second iteration, we wanted to increase the complexity [20] of interacting with 
the device so the interaction was less binary, such as a touch = a sound. Instead, it was 
decided to make the coupling between the input and output less apparent, giving it the 
autonomy to interpret the frequency of interaction and respond according. Within the 
second iteration algorithm, there exist ?ve cases for interaction modalities for either 
sound or light, meaning ?ve for sound and ?ve for light. There is a manual switch on 
the Electronic Kintsugi so participants can choose if they are interacting with light or 
sound. These ?ve cases were ?ve variations in types of output cycled through a timer 
based on interaction. If the user was interacting with the Electronic Kintsugi, then it 
would remain on that mode longer, until they paused interacting, to not interrupt their 
?ow of interaction. Then it would move to the next mode. Each mode was a variation 
in output, so for example, for sound, it might be di?erent chords or tones. 
This had the purpose of giving the participant less time to recognize patterns in the 
behaviour and enhance the user’s curiosity. We focused on how the interaction between 
the participants and the Electronic Kintsugi could be more tightly or loosely coupled, 
yet also incorporate elements of surprise; and what implications this interaction had for 
the participants’ association to the Electronic Kintsugi as a device, versus as an instru- 
ment, companion, or tool. 
4.3 Workshop 2: Methods 
The second workshop was scheduled six months after the ?rst, due to travel and revisions 
to the technology and workshop design. 
Approaching workshop two, Wakkary et al. [11] published a work, “Morse Things” 
wherein they utilised a methodology for engaging design researchers to evaluate their 
everyday object through having the object in their home for some weeks, and then 
following up with a workshop with the design researchers to explore the experiences 
with the object. We adopted this methodology for our work, and asked four design 
researchers to evaluate the Electronic Kintsugi in their homes for a period of ?ve weeks 
followed by a workshop. We chose to use this method, in agreement with Wakkary et al. 
who explain, “A key motivation in our approach was the desire to deepen our investi- 
gation by including a wider range of experts that have the design expertise to perceive 
and investigate the nuanced and challenging notions of thing-centeredness.” 
2 
We followed instructions from: http://www.instructables.com/id/Touche-for-Arduino-
3 
Advanced-touch-sensing/. 
We used this library: https://github.com/dzlonline/the_synth. 
Electronic Kintsugi 109
4.4 Participant Selection and Introduction to Electronic Kintsugi 
Opportunity sampling was used to select experts in design research from di?erent back- 
grounds, aged 30–38, living in Copenhagen to ensure di?erent perspectives on the 
experience and imagined future uses. Participants’ names have been changed for their 
privacy. Their backgrounds are crossovers between the ?elds of engineering, interaction 
design, dance, performance design, industrial design, robotics, and hardware develop- 
ment. 
Participants were recruited by email and it was explained to the participants that 
they’d have the object in their home for 5 weeks and engage with it for a minimum of 
15 min per week, spending another 15 min per week journaling their experiences. 
Participants were asked to keep a record of their thoughts and experiences and to both 
keep these as a document and bring these thoughts to the workshop at the end. 
We found four researchers who were available to review the device worked. Our 
goal here was to invite these experts to explore with us and ?nd out what questions to 
ask participants [21]. 
We describe the speci?c methods we used during workshop 2 in the section “Second 
Workshop” to maintain continuity and legibility of this work (Fig. 2). 
Fig. 2. Touching the traces on the Kintsugi bowl with the Electronic Kintsugi boxes displaying 
light and playing sound. 
5 First Workshop: FabCafe Tokyo 
Workshop 1 informed our work and to set the scene for workshop 2. The workshop was 
conducted in both English and Japanese, and participants could communicate in their 
preferred language. We used a written questionnaire so participants could answer in 
their preferred language. We brie?y present workshop one and then move to re?ect on 
?ndings from workshops one and two. 
After a brief demonstration of function, the Electronic Kintsugi was explored by 
participants. They touched the traces with one, two or all ?ngers, and tried turning the 
ceramics over, holding it in one hand or two. We explained “the output could be 
anything, it could start your car, or feed your pet”. 
110 V. J. Carpenter
Since participants were familiar with the interaction technique after exploring the 
sound interaction, the light interaction had a much di?erent approach. Participants knew 
how they could touch it, with one or several ?ngers and they now focused on light or 
harder touches, strokes, or resting their ?nger on the traces. The light was much more 
unpredictable than the sound. Whereas with the sound, they were acting almost as musi- 
cians, experimenting to ?nd patterns and particular notes, with the light it was more 
about getting a bigger or smaller reaction than it was about the nuances in between these 
small or large bursts of light. One participant asked, “I want to know how much it’s me 
that is controlling it and how much it is doing on its own”. 
5.1 Findings 
We highlight several responses here from the questionnaire to inform future researchers 
in this ?eld who might be interested in working further with this. 
• Encouraging senses and emotions 
– Being able to handle the Kintsugi was a special experience, “There is a di?erent 
feel to a real Kintsugi. It’s rare to see the hitting of the device so profoundly.” 
(P-1A) and “We’re often not given permission to touch traditional art. It feels 
good to be encouraged to touch it.” (P-1E). 
• An interest in other senses: taste, smell, and food 
– One participant who suggested it be used as a bowl to eat from “Japanese people 
eat with bowls close to their mouth, so I want to see some sound installation when 
someone is eating” (P-1A) and another who suggested that it could be used for a 
cat or dog food request device “imagine the cat’s tongue licking the Kintsugi!.” 
(P-1C). 
• Light – Unpredictable but has potential 
– One participant noted that the light reminded them of a starry sky and stated, “In 
a larger, or aesthetically ordered or di?erent setting (night), it would be very 
soothing” (P-1C). Another participant was inspired and shared an idea “The 
combination of the craft and the touch with the light feedback reminded me of the 
challenges of regaining ?ne motor control in a ?nger after an accident. The focus 
required and the tranquility of the lights may be a fun alternative physical therapy.” 
(P-1E) 
• Sound – Alive characteristics 
– One participant remarked, “Craft has character, especially as it ages. How might 
that character be represented as sound? I feel the sounds were lovely but not 
aligned with the character of the craftwork. Or maybe it had juxtaposition of sound 
quality and physical character which enhances the contrast between tradition and 
technology.” (P-1F). Two participants related to the object in an anthropomorphic 
way, stating “It was like the cup was telling me how he/she’s doing. Since Kintsugi 
part is a past wound, sometimes I felt like it’s telling me it had pain.” (P-1E). 
Electronic Kintsugi 111
5.2 Findings Summary 
The workshop provided us with some considerations about the role of art and objects 
and potential interactivity from these objects. Participants were excited to play with art 
and traditional craft based objects. They were fascinated by the light and sound output 
and could extrapolate to imagine other interaction scenarios. They explored the aesthetic 
interaction qualities and played the Kintsugi like an instrument, using expressive hand 
gestures to explore the touch interaction. And they could re?ect on the role of technology 
and tradition and how we live our lives: “Developing a closer, more physical relationship 
with the objects in our lives feels meaningful.” (P-1E). 
6 Second Workshop: Copenhagen 
To prepare for the second workshop, we asked participants to spend 20 min in silence 
[22] to complete a written activity to gather their pre-workshop thoughts and feedback 
prior to engaging in dialogue. 
We used Kujala, Walsh, Nurkka, and Crisan’s [23] method of sentence completion 
to extract these initial reactions. We provided the instructions that participants should 
answer quickly (20 questions in 20 min) and the beginning of the sentence was given, 
which was then completed by the participant in a way they saw ?t. Kujala and Nurkka 
[24] used categories of user values to classify questions. In Fig. 1, one can see the 
sentences we de?ned, as per each value category. We tried to make a nearly even number 
of positive and negative questions, and allowed extra space if they wished. 
6.1 Sentence Completion Tool 
112 V. J. Carpenter
A Likert scale [25] was used to determine their reactions to sound and light inter- 
actions. We asked participants to rate the light and sound interaction. For light, we asked 
“I found the light output to be:” and gave one of the scales the value of “Calming” and 
the other end of the scale “Attention Seeking”. For sound, we asked the same, but added 
an additional scale of “noise” to “music”. 
We spent the remaining 2.5 hours engaged in a group discussion about their expe- 
riences, comparing, contrasting, and exploring possible future interactions. 
Electronic Kintsugi 113
6.2 Findings of Workshop Two 
We used mind mapping as a technique to map out the responses from the discussion and 
journals [26]. We present here the results of the sentence completion as well as the 
discussion and journals. 
6.3 Sentence Completion 
We compared the sentence completion responses sentence by sentence and by category. 
The Electronic Kintsugi was described as “enjoyable, calming, interesting, and 
di?erent” in the one word descriptions. The ?ndings from participants, ordered by the 
Sentence Completion Tool headlines [23] were: 
General: Participants felt a sense of achievement when interacting with others and 
felt connected to it when it: “reacted to my own and others touching it”. 
General: Predictability. They were disappointed and frustrated with the light inter- 
action: “the light interaction was unpredictable, non-responsive and not interesting”. It 
is noted here that in both workshops, the light was reported to be not as responsive as 
the sound input. Participants in both workshops reported that they were more fascinated 
with the sound feedback, particularly because there were more nuances in the sound 
than in the light. 
Emotional: Participants described their emotional response as “playfulness and 
companionship, calming, joy and puzzled” and again highlighted their frustration with 
the lights, describing them as “underwhelm(ing), disappoint(ing), and distanced”. Two 
participants referenced the social values and stated that their best experiences were while 
playing with others. 
Stimulation and epistemic: Participants described the changing soundscape, 
mentioned their desire to use it when someone asked about it. 
Growth and self-actualization: Participants described both, relaxation and concen- 
tration as well as creative thinking and social interaction as outcomes of their interactions 
with the Electronic Kintsugi. 
Traditional values: Participants noted that, as an object in their home, it was “cute 
and modern”, “playful and interactive” and that it “combined ceramics with playful- 
ness”. 
Finally, in the extra space provided, three responses were thought provoking 
• I kept receipts in it and I liked how it became less precious and more functional 
• I wonder if you were tracking my use 
• It was a search into new creative possibilities. 
The Likert Scales gave us the below results, indicating that while results varied, light 
was generally thought to be more attention seeking than calming, sound was found to 
be generally more calming than attention seeking and sound was more musical than 
noisy. 
114 V. J. Carpenter
“I found the light output to be:” (Calming = 1, 
Attention Seeking = 10) 
Average rating of 5.75 (Actual Rating Values 
= 8, 4, 4, 7) 
“I found the sound output to be:” (Calming = 
1, Attention Seeking = 10) 
Average rating of 3.75 (Actual Rating Values 
= 3, 3, 7, 2) 
Extra question for sound: (Noise = 1, Music = 
10) 
Average rating of 6.25 (Actual Rating Values 
= 6, 5, 5, 9) 
From the discussion and journaling, three primary categories of interest emerged: 
(1) enhancing human connection through embedded or “magic” technology, (2) using 
a craft based object in prompting personal re?ection and development, and (3) exploring 
transferable design principles of smart products with a device which has no de?ned 
purpose, and which converges traditional craft and technology. In the accounts below, 
participants focused primarily on the sound based interaction as they were not interested 
in the light interaction and spent most of their time with sound (Fig. 3). 
Fig. 3. The Electronic Kintsugi bowl with a design researcher, she is playing with the light as a 
break from work. 
7 Three Themes Identi?ed 
7.1 Enhancing Human Connection Through Embedded or “Magic” Technology 
There were several accounts of how the Electronic Kintsugi sparked social connections 
and interactions. Antonio had placed it in the kitchen and he explained that the bowl on 
its own might not have sparked curiosity but the box did and visitors asked what it was 
and then wanted to play with it. For Sandra, she was having an evening of entertaining 
guests, and as they were ?nally leaving (she was tired), she stood in the doorway, and 
absent-mindedly touched the bowl as they were putting on their shoes. The guests 
became immediately intrigued and asked questions and wanted to play with it, which 
Electronic Kintsugi 115
was both charming and exhausting, since, as Sandra explained, she was ready for them 
to go home, but also happy to play and show them the bowl. For Henry, it was a social 
life saver as he suddenly found himself spending time with his father in law who doesn’t 
speak much English, and Henry doesn’t speak much Danish. The Electronic Kintsugi 
came to the rescue as a medium they could explore together, without a need for verbal 
language. Martin explained that he took it on the bus and it was “totally inappropriate” 
there, it was loud and kept making screeching noises. He was frustrated with it, and 
imagined if it was quiet and making nicer sounds as it often did (though, not on the bus) 
then he could have asked others to join in on the playing. 
The ‘magic’ of the object was intriguing to people who didn’t know what it was and 
sparked both play and conversation, even, in Sandra’s case, when they should have been 
leaving. It o?ered a needed social lubricant in the case of Henry and sparked ideas on 
how to engage strangers on the bus for Martin. Having an everyday object have ‘magical’ 
and unexpected properties, without being a gadget, or being used for some other purpose 
(a fancy remote, a communications device, etc.) seemed to be the key to sparking this 
social interaction. Unexpected qualities of playfulness via a changing soundscape were 
the right recipe for the Electronic Kintsugi. 
7.2 Using an Everyday Object in Prompting Personal Re?ection and 
Development 
Our experts felt that an everyday object combining traditional craft and technology was 
important, commenting that they “wanted to come back to it again, it levels up, it evolves 
over time” (Martin) and “I love that it’s not intuitive, you have to spend time with it and 
get to know it. It’s nice that it doesn’t have a de?ned purpose, somehow it’s good to just 
have something nice and electronic in your home, especially with the copper tape, it 
feels like a crafted aesthetic, you can see craft, and the time put into it, but you can’t see 
code, so somehow this makes tangible the craft of the code”. (Henry). Sandra likened it 
to a “Tibetan singing bowl, you have to hit it just right and there’s a pleasure behind 
controlling that energy”. And Martin continued, “The electronics force you into move- 
ment, I’ve never done this with an Ikea bowl”. 
Bringing together physical and digital materials, considering both the craft of the 
object and the craft of the code, and, considering the social surroundings that the object 
inhabits were important aspects of creating a hybrid craft [16]. 
For us, it is the combination of these things which is a signi?cant part of designing 
for meaningful interactions and experiences when working with future smart everyday 
products in the home. 
7.3 The Role of an Object with a Non-de?ned Purpose 
The fact that the purpose of the object was open-ended was well-liked, and the partici- 
pants used this opportunity to explore the possibilities with it. Some of their comments 
included “I love that it’s not intuitive, you have to spend time with it and get to know 
it” (Martin) and “It was interesting, as a dancer, that I played a lot with the hand move- 
ments and did improvised hand movements” (Sandra). 
116 V. J. Carpenter
It was brie?y discussed what it might be like to grow up with an object like this in 
your home, instead of an iPad or TV, and how that might change your perceptions of 
how you interact with the world, and come to appreciate objects. Sandra explained “I 
prefer it as an ornament, something non-connected. It can be a companion, or a container, 
such as for my receipts.” The combination of a non-de?ned interaction purpose with the 
functionality of a common object, a bowl, seemed to work well to invite playful and 
curious interactions. 
While some experts poured water into the bowl to explore the sound, Antonio took 
it a step further, and ate his breakfast cereal from the bowl, “it made me aware of how 
fast I was eating”. (Interestingly, in workshop one, this was a suggestion from partici- 
pants, that it could be nice to eat from the bowls). The choice to use a bowl came from 
our fascination with Kintsugi and the tendency there to repair bowls, and we learned 
that as a starting object for this exploration, a bowl has so many inherent properties, 
something to eat from, to store things in, as a decorative object, as a historical object, 
it’s nice to hold, and it exists in many cultures, and many homes. 
Creating an object with non-de?ned purpose can be one way to encourage curiosity, 
playfulness and an opportunity for the creation of meaningful or important moments in 
one’s life, especially when there is a human-to-self (self-development) or human-to-human 
(social) aspect. On the contrary, further interaction design would be necessary 
once an object moves beyond being something with a non-de?ned purpose. In this work, 
our focus on a non-de?ned purpose is not disregarding designing interactions for a 
speci?c context, but rather our focus is on designing interaction concepts at an earlier 
phase of the project development. 
8 Discussion 
It is worthwhile to revisit Borgmann (as described by Fallman [19]) here, who worried 
that technology would “turn us into passive consumers, increasingly disengaged from 
the world and from each other” [19]. Our aim with Electronic Kintsugi, and a focus on 
designing for ambiguous interactions with everyday objects, is to move back towards 
each other, towards engagement with familiar objects, towards creativity and playful- 
ness and that it is “not simply [a] neutral means for realizing human ends, but actively 
help[s] to shape our experiences of the world” [19]. 
Despite work in academia developing tangible, non-screen devices or criticising IoT 
(as earlier presented) the products which emerge on-market today are not abundantly 
re?ective of this. These products do not necessarily engage people on a human-to-human 
or human-to-self level and instead, often cater to ?xing a small problem without neces- 
sarily considering a more holistic impact. Cila, Smit, Giaccardi and Kröse [8], describe 
the current approach to IoT as being short-sighted and emphasize the potential for the 
role of interaction design in new smart things. In our work, we expand on this, and 
emphasize a need for smart things to perhaps be rooted in craft to enhance meaning-making, 
to utilize non-screen interaction, and to move towards facilitating human-to-human 
or human-to-self exploration. 
Electronic Kintsugi 117
We further emphasize the role of a device with an unde?ned interaction purpose, as 
opposed to the very speci?c devices emerging on market today such as smart candles4 
(controllable via app) or smart hair brushes.5 
Although we needed to use copper tape to achieve the conductivity, in the future, 
we would like to explore which material properties would allow a Kintsugi artist to 
create something more conductive using the traditional precious metals. Given this, the 
most signi?cant aspect was the conceptual consideration of how one might interact with 
an object which had been created by an artist, but is otherwise an ‘everyday object’ (one 
which we might ?nd in our homes anyway, such as a bowl). 
Returning to Cranny-Francis’ semefulness, we can see the aspects of physical, 
emotional, intellectual, spiritual, social, and cultural [18] in the Electronic Kintsugi. We 
essentially augment a crafted object with technology, with the aim of created an 
enchanted [14] everyday object with a historical, crafted background which is open to 
interpretation and explorative play. The role of an enchanted [14] everyday object is 
especially important to consider in a world of increasing IoT gadgets. Considering a 
future vision of connected everything, we feel it is important that we do not become too 
focused on the technology, such as having RFIDs under our skin [27] or being laden 
with smart tablets, smart watches or smart water bottles; but rather, that we embrace 
humanness. 
We want to create devices which provoke thoughtful and critical re?ection, and 
engage people on a tangible level; not just a screen asking if you’ve been mindful today 
[28]. When considering the design of new ‘smart’ objects, we should perhaps ask, “does 
it need to be connected, and if so, why?”, or “how can I enhance the existing values in 
this everyday object?” A door handle for example, doesn’t just open a door, it is the 
literal door to coming home from work, relaxing after a long day, seeing your family 
again, and more.6 
The a?ordances inherent in everyday objects are many and it is our 
job as interaction designers to not only invent new technologies and uses but to consider 
how to support these values and avoid turning the objects in our world into cloud-connected 
gadgets. 
Electronic Kintsugi embraces new technology and established craft practices, 
emphasizing curiosity and playfulness while facilitating interaction between people and 
the self. Furthermore, we felt that the aspect of craft was a key identi?er in what made 
the everyday object special. The history and delicate quality of the Kintsugi had multiple 
reactions, the participants in Japan were intrigued that they were allowed to play with a 
piece of art, and the participants in Denmark were eager to engage with, and learn more 
about Kintsugi. Our primary concern was the investigation of a non-screen, tangible 
everyday object coming from a place of craft, and in future work we hope to further 
investigate how we could work with a Kintsugi artist to create a fully functional piece 
of Electronic Kintsugi, with capacitive traces in the piece. 
4 
5 
https://www.ludela.com/. 
6 
https://www.kerastase-usa.com/connected-brush. 
From an interview with designer Carl Alviani (http://meaningfuldevices.vanessa- 
carpenter.com/2017/08/10/anything-but-personal-is-a-failure/). 
118 V. J. Carpenter
9 Conclusion 
In this work, we have presented Electronic Kintsugi: an exploration in how an everyday 
object (a bowl) in combination with artisanal craft (Kintsugi) and electronics (conduc- 
tive sensing) could result in more human-to-human connection and human-to-self 
development. Through two workshops, one in Japan with a Kintsugi artist and partici- 
pants, and one in Denmark, with design research experts, we explored the properties of 
this Electronic Kintsugi, an interactive object with no de?ned purpose and two main 
interaction outputs - sound and light. We found that sound as feedback was of signi?cant 
interest due to its nuanced nature and reactiveness, and between workshops, the sound 
was programmed to evolve over time with use. 
Using copper tape, we augment a traditional, crafted object, namely, Kintsugi with 
electronics, and call it Electronic Kintsugi, creating an open platform for play, explo- 
ration and development. In future work, we hope to continue work with Kintsugi artists 
to ?nd a material which can be used in the craft practice, which would also be conductive 
enough for Electronic Kintsugi. 
We identi?ed three categories of re?ection from our studies with participants, and 
areas which future smart products can look to, to enable more meaningful interactions 
between human and human and human and device. These categories are: (1) enhancing 
human connection through embedded or “magic” technology, (2) using everyday objects 
to prompt personal re?ection and development, and (3) exploring transferable design 
principles of smart products with a device of unde?ned purpose, and which converges 
traditional craft and technology. 
Finally, we discussed that as interaction designers, we would like to focus on 
embracing humanness in future technology designs and could look to the values and 
a?ordances inherent in everyday objects to bring out these values and design for these 
moments in our lives. 
Acknowledgment. We are grateful to FabCafe Tokyo, Kurosawa-San, the participants of 
workshop one, the design experts of workshop 2, and all the user testers and helpers along the 
way. 
References 
1. Zheng, C., Nitsche, M.: Combining practices in craft and design. In: Proceedings of the Tenth 
International Conference on Tangible, Embedded, and Embodied Interaction (TEI 2017), pp. 
331–340. ACM, New York (2017). https://doi.org/10.1145/3024969.3024973 
2. Zoran, A., Buechley, L.: Hybrid reassemblage: an exploration of craft, digital fabrication and 
artifact uniqueness. Leonardo, 46(1), 4–10 (2013). http://www.research.lancs.ac.uk/ 
portal/en/publications/designing-information-feedback-within-hybrid-physicaldigital-interactions(
4709b666-bbe3-46f8-ad3a-6d06fdd6f5cd)/export.html 
3. Lingel, J.: The poetics of socio-technical space: evaluating the internet of things through craft. 
In: Proceedings of Conference on Human Factors in Computing Systems (CHI 2016). ACM, 
New York (2016). https://doi.org/10.1145/2858036.2858399 
Electronic Kintsugi 119
4. Schoemann, S., Nitsche, M.: Needle as input: exploring practice and materiality when crafting 
becomes computing. In: Proceedings of the Eleventh International Conference on Tangible, 
Embedded, and Embodied Interaction (TEI 2017). ACM, New York (2017). https://doi.org/ 
10.1145/3024969.3024999 
5. Hogan, T., Hornecker, E.: Feel it! See it! Hear it! Probing tangible interaction and data 
representational modality. In: Proceedings of DRS 2016, Design Research Society 50th 
Anniversary Conference, Brighton, UK (2016) 
6. Kettley, S., Sadkowska, A., Lucas, R.: Tangibility in e-textile participatory service design 
with mental health participants. In: Proceedings of DRS 2016, Design Research Society 50th 
Anniversary Conference, Brighton, UK (2016) 
7. Mols, I., van den Hoven, E., Eggen, B.: Informing design for re?ection: an overview of current 
everyday practices. In: Proceedings of the 9th Nordic Conference on Human–Computer 
Interaction (NordiCHI 2016). ACM, New York (2016). https://doi.org/ 
10.1145/2971485.2971494 
8. Cila, N., Smit, I., Giaccardi, E., Kröse, B.: Products as agents: metaphors for designing the 
products of the IoT age. In: Proceedings of the 2017 CHI Conference on Human Factors in 
Computing Systems (CHI 2017), pp. 448–459. ACM, New York (2017). https://doi.org/ 
10.1145/3025453.3025797 
9. Akama, Y., Light, A., Bowen, S.: Mindfulness and technology: traces of a middle way. In 
Proceedings of the 2017 Conference on Designing Interactive Systems (DIS 2017), pp. 345– 
355. ACM, New York (2017). https://doi.org/10.1145/3064663.3064752 
10. Mols, I., van den Hoven, E., Eggen, B.: Balance, cogito and dott: exploring media modalities 
for everyday-life re?ection. In: Proceedings of the Eleventh International Conference on 
Tangible, Embedded, and Embodied Interaction (TEI 2017), pp. 427–433. ACM, New York 
(2017). https://doi.org/10.1145/3024969.3025069 
11. Wakkary, R., Oogjes, D., Hauser, S., Lin, H., Cao, C., Ma, L., Duel, T.: Morse things: a design 
inquiry into the gap between things and us. In: Proceedings of the 2017 Conference on 
Designing Interactive Systems (DIS 2017), pp. 503–514. ACM, New York (2017). https:// 
doi.org/10.1145/3064663.3064734 
12. Núñez Pacheco, C., Loke, L.: Tacit narratives: surfacing aesthetic meaning by using wearable 
props and focusing. In: Proceedings of the Eleventh International Conference on Tangible, 
Embedded, and Embodied Interaction (TEI 2017), pp. 233–242. ACM, New York (2017). 
https://doi.org/10.1145/3024969.3024979 
13. Tsaknaki, V., Fernaeus, Y.: Expanding on wabi-sabi as a design resource in HCI. In: 
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI 
2016), pp. 5970–5983. ACM, New York (2016). https://doi.org/10.1145/2858036.2858459 
14. Rose, D.: Enchanted Objects: Design, Human Desire, and the Internet of Things. Simon and 
Schuster, New York (2014) 
15. Tsaknaki, V., Fernaeus, Y., Schaub, M.: Leather as a material for crafting interactive and 
physical artifacts. In: Proceedings of the 2014 Designing Interactive Systems (DIS 2014). 
ACM, New York (2014). https://doi.org/10.1145/2598510.2598574 
16. Tsaknaki, V., Fernaeus, Y., Rapp, E., Belenguer, J.S.: Articulating challenges of hybrid 
crafting for the case of interactive silversmith practice. In: Proceedings of the 2017 
Conference on Designing Interactive Systems (DIS 2017), pp. 1187–1200. ACM, New York 
(2017). https://doi.org/10.1145/3064663.3064718 
17. Nordrum, A.: Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is Outdated 
(2016). https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-things-forecast-
of-50-billion-devices-by-2020-is-outdated 
120 V. J. Carpenter
18. Cranny-Francis, A.: Semefulness: a social semiotics of touch. Soc. Semiot. 21(4), 463–481 
(2011). https://doi.org/10.1080/10350330.2011.591993 
19. Fallman, D.: The new good: exploring the potential of philosophy of technology to contribute 
to human–computer interaction. In: Proceedings of the SIGCHI Conference on Human 
Factors in Computing Systems (CHI 2011), pp. 1051–1060. ACM, New York (2011). https:// 
doi.org/10.1145/1978942.1979099 
20. Hobye, M.: Designing for Homo Explorens: Open Social Play in Performative Frames, pp. 
16–17. Malmö University, Malmö (2014) 
21. Bødker, S.: When second wave HCI meets third wave challenges. In: Mørch, A., Morgan, 
K., Bratteteig, T., Ghosh, G., Svanaes, D. (eds.) Proceedings of the 4th Nordic Conference 
on Human–Computer Interaction: Changing Roles (NordiCHI 2006), pp. 1–8. ACM, New 
York (2006). https://doi.org/10.1145/1182475.1182476 
22. Martin, B., Hanington, B.: Universal Methods of Design. Rockport Publishers, Beverly 
(2012) 
23. Kujala, S., Walsh, T., Nurkka, P., Crisan, M.: Sentence completion for understanding users 
and evaluating user experience. Interact. Comput. 26(3), 238–255 (2014). https://doi.org/ 
10.1093/iwc/iwt036 
24. Kujala, S., Nurkka, P.: Identifying user values for an activating game for children. In: 
Lugmayr, A., Franssila, H., Sotamaa, O., Näränen, P., Vanhala, J. (eds.) Proceedings of the 
13th International MindTrek Conference: Everyday Life in the Ubiquitous Era (MindTrek 
2009), pp. 98–105. ACM, New York (2009). https://doi.org/10.1145/1621841.1621860 
25. Brooke, J.: SU: a quick and dirty usability scale. In: Jordan, P., Thomas, B., Weerdmeester, 
B.A., McClelland, I. (eds.) Usability Evaluation in Industry, pp. 189–194. Taylor & Francis, 
London (1996) 
26. Wheeldon, J., Faubert, J.: Framing experience: Concept maps, mind maps, and data collection 
in qualitative research. Int. J. Qual. Methods. (2009). https://doi.org/ 
10.1177/160940690900800307 
27. Astor, M.: Microchip implants for employees? One company says yes. New York Times 
(2017). https://www.nytimes.com/2017/07/25/technology/microchips-wisconsin-company-employees.
html 
28. Newman, K.M.: Free Mindfulness Apps Worthy of Your Attention. Mindful (2017). https:// 
www.mindful.org/free-mindfulness-apps-worthy-of-your-attention/ 
Electronic Kintsugi 121
A Novel and Scalable Naming Strategy 
for IoT Scenarios 
Alejandro Gómez-Cárdenas(?) , Xavi Masip-Bruin, 
Eva Marín-Tordera, and Sarang Kahvazadeh 
Advanced Network Architectures Lab (CRAAX), Universitat Politècnica de Catalunya (UPC), 
Barcelona, Spain 
{alejandg,xmasip,eva,skahvaza}@ac.upc.edu 
Abstract. Fog-to-Cloud (F2C) is a novel paradigm aimed at increasing the 
bene?ts brought by the growing Internet-of-Things (IoT) devices population at 
the edge of the network. F2C is intended to manage the available resources from 
the core to the edge of the network, allowing services to choose and use either a 
speci?c cloud or fog o?er or a combination of both. Recognized the key bene?ts 
brought by F2C systems, such as low-latency for real-time services, location 
awareness services, mobility support and the possibility to process data close to 
where they are generated, research e?orts are being made towards the creation of 
a widely accepted F2C architecture. However, in order to achieve the desired F2C 
control framework, many open challenges must be solved. In this paper, we 
address the identity management challenges and propose an Identity Management 
System (IDMS) that is based on the fragmentation of the network resource IDs. 
In our approach, we divide the IDs into smaller fragments and then, when two 
nodes connect, they use a portion of their full ID (n fragments) for mutual iden- 
ti?cation. The conducted experiments have shown that an important reduction in 
both, the query execution times and the space required to store IDs, can be 
achieved when our IDMS is applied. 
Keywords: IDMS · Identity management · Fog-to-Cloud · Resource identity 
1 Introduction 
The Internet of Things (IoT) is a communication paradigm that allows all kind of objects 
to connect to the Internet network. According to [1] on 2020 the number of connected 
devices will reach the 50 billion, that is, 6.58 times more than estimated world population 
for the same year. Aligned to the constant growth of the IoT devices population, the 
amount of data they generate at the edge of the network is growing as well. Every day, 
large volumes of data in all formats (video, pictures, audio, plain text, among others) 
are generated and then moved to cloud datacenters to be processed. In fact, it is estimated 
that in the near future only an autonomous car will produce up to 4 TB data on a daily 
basis [2]. 
It is widely accepted that useful information can be extracted from data, using cloud-based 
data mining techniques. Nevertheless, moving large amounts of data from the 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 122–133, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_10
edge to the datacenters located at the core of the network may incur signi?cant overhead 
in terms of time, network throughput, energy consumption and cost [3]. To overcome 
these issues, novel computing paradigms such as fog computing have emerged at the 
edge of the network. 
Fog computing is a paradigm intended to extend cloud computing capacities to the 
edge of the network, allowing data to be processed and aggregated close to where it is 
generated [4]. The fact that Fog computing is deployed close to the end users devices 
facilitates some key characteristics for IoT services and applications, such as for 
example, low-latency, mobility, and location-awareness [5]. Indeed, Fog computing 
emerged to collaborate with cloud computing, thus not competing each other. 
Nowadays, the new combined fog-to-cloud [6] proposed to ease service execution 
in hierarchical fashion fog, cloud, or combination of both. There are two ongoing 
projects to deploy the hierarchical and combined F2C system. One of them is called 
OpenFog consortium [7] and another one mF2C [8]. The mF2C project, at early stage 
proposed a hierarchical and layered architecture that the whole set of resources can be 
executed in cloud, fog, or combination of both. In mF2C, distributed fog nodes can be 
utilized for delay-sensitive and demanded low-latency services and processing at the 
edge of the network, and in parallel, cloud can be used for massive and long-term 
processing and storage. 
In a realistic scenario, F2C is shown as a hierarchical three tiers architecture [9] 
where the most constrained devices are located at the lower tier. The middle tier is 
integrated by nodes that act as aggregator of the available resources for the lower layer 
(see Fig. 1) and ?nally, at the top of the hierarchy, the cloud datacenter is located. 
Fig. 1. Fog-to-Cloud general topology. 
Certainly, the F2C resources continuum must be managed by a control strategy (sort 
of control plane), but because there are still many challenges to be solved, the control 
concept as a whole, is yet an open issue for Fog and surely F2C systems. 
One of the challenges to be addressed in F2C systems is the lack of an Identity 
Management System that meets the speci?c paradigm requirements. In F2C, the Identity 
Management System (IDMS) is the set of functions aiming to provide a mechanism to 
assign and in general to manage, the resource identities of both, physical and virtual 
devices. According to [10], the management of the resource identi?ers at the edge is 
very important for programming, addressing, things identi?cation, data communication, 
A Novel and Scalable Naming Strategy for IoT Scenarios 123
authentication, and security. Thus, the IDMS is a key component of the F2C control 
plane framework. 
In short, some of the features an IDMS should provide in F2C system are: (i) the 
capability to scale smoothly in parallel with the network; (ii) supporting devices 
mobility without losing their identities; (iii) security and privacy protection; (iv) 
interoperability among different service providers and; (v) supporting highly 
dynamic network topologies. 
In this paper, we focus on the IDMS challenge and propose a solution that address 
the aforementioned system requirements. The key contributions of our work when 
compared with other available solutions include the mobility support, it is, the capability 
of the edge devices to keep their identi?ers, even when they are on the move. Such ID 
persistence eases the mutual identi?cation and authentication processes between a node 
and an aggregator node in future interactions. Likewise, the IDMS strategy that we 
propose allows to adjust the identi?ers size that the resources use in the network without 
losing the identity uniqueness property. Finally, unlike other solutions, our proposal is 
focused in reducing the compute load required to identify the resources in the network. 
This undoubtedly bene?ts the entire network, especially the lowest layer in the hierarchy 
where the resources are very constrained and therefore a more e?cient management of 
them is required. 
The remainder of this paper is organized as follows. In Sect. 2 other IDMS solutions 
are reviewed. In Sect. 3 our IDMS proposal is described. The evaluation and results are 
presented in Sect. 4 and ?nally, in Sect. 5 the conclusions and future work are discussed. 
2 Related Work 
In computer networks, the name and the address of a device stand for two di?erent 
things. The general distinction between a name and an address is that a name can remain 
with an entity even when that entity is mobile and moves among di?erent locations (i.e. 
addresses) [11]. From the IDMS perspective, the mobility support o?ered by F2C means 
that the identi?ers assigned to the network resources are persistent, i.e., they remain even 
if the attributes, such as the location of the devices change. Therefore, the usage of 
addressing techniques to manage the resources identity in F2C is not the proper solution. 
Rather, an IDMS that gives support to both, static and mobile nodes in the network, must 
be considered. 
Under this premise, in this section we pay special attention to IDMS solutions whose 
target include IoT-devices. The rationale of this decision is that generally speaking, IoT 
puts together static and mobile devices, thus, providing support to all of them is manda- 
tory in any solution to be deployed in the IoT arena. 
In [12], authors present a smart home operating system for the IoT named EdgeOSH. 
In EdgeOSH, the architecture component in charge of managing the devices identities 
is the naming module. Such module allocates unique human friendly names describing 
the location (where), role (who) and data description (what) of the devices, for example, 
LivingRoom.CellingLight.Bulb2. These names are used by the operating system to 
manage services, data and devices. 
124 A. Gómez-Cárdenas et al.
Nevertheless, the way in which EdgeOSH manages the devices identities presents 
several drawbacks that prevent it from being used in F2C environments. For example, 
human-meaningful names ease to disclose sensitive information and to access unau- 
thorized network resources through masquerade attacks. Another issue refers to the fact 
that it is not prepared to support the tremendously large number of devices expected in 
F2C, i.e., therefore, it is not scalable. As a consequence, the authors concluded that an 
e?cient IDMS for the IoT is still an open problem and further investigation is required. 
Motivated by the need of an identity information service where the provider of the 
service is unable to access the information that passes through their servers, authors in 
[13] proposed BlindIdM, an Identity Management as a Service (IDaaS) model with a 
focus on data privacy protection. In such model three main type of actors are de?ned: 
users, service providers and identity providers. The user is a node in the network with 
the identity information of a set of entities and its goal is to transfer such information to 
the service provider in a secure fashion. The authors claim that through encryption 
techniques, BlindIdM permits to send the identity information from the user to the 
service provider without the identity provider being able to read it. To achieve this, the 
information is initially encrypted by the user, then re-encrypted by the identity provider 
and ?nally decrypted by the service provider. The results obtained during the evaluation 
of the proposal show assumable times for the three cryptographic operations, however, 
it is important to note that these operations were performed by powerful cloud data 
centers. Given the decentralized nature of the F2C paradigm, it is likely that some of the 
key functions of the control plane will be executed in the edge of the network, including 
the identity management service. In this sense, the three cryptographic operations 
proposed by the authors may cause an important bottleneck, degrading the overall 
system quality of service (QoS) in terms of response times. 
In [14] authors introduce a user-centric identity management framework for IoT. 
They propose the creation of a global identity provider (gIdP), responsible for main- 
taining global identity. The gIdP is used by the service providers (SP) to generate local 
identity. However, this proposal has two major drawbacks: (i) the global identity 
provider represents a single point of failure in the system – such centralization contra- 
dicts the F2C paradigm; (ii) the proposed framework is intended to provide identities to 
the user rather than the devices. In F2C, regardless of whether several devices belong 
to the same person, every node in the network must have its own unique identi?er, thus, 
an object-centric approach should be applied. 
The work in [15] present a machine-to-machine IDMS that allows network devices 
to generate multiple pseudonyms to be used as identi?er in di?erent applications. They 
use anonymous attestation to perform veri?cation of the pseudonym, i.e., an interactive 
method for one party to prove to another that the pseudonym is valid and should be 
accepted but without revealing anything else than the validity of the pseudonym. The 
problem of implementing this identity management strategy in a F2C systems is that the 
anonymous attestation is a set of complex mathematical expressions that the nodes have 
to solve in order to validate the identity of other nodes. Thus, the calculations destined 
to validate the identity of other devices will add a signi?cant delay in the connection 
establishments between nodes, mainly motivated by the low-computational power 
devices at the lowest F2C layer have. 
A Novel and Scalable Naming Strategy for IoT Scenarios 125
3 IDMS Proposal 
The IDMS proposal is partitioning globally unique IDs into the set of smaller fragments 
(fg). The fragments partitioning eases network resources to be identi?ed by a fraction 
of their ID instead of the full identi?er according to their position in the hierarchical 
F2C network as shown in Fig. 2. 
Fig. 2. Identi?er fragmentation. 
First of all, we de?ne the hierarchical F2C network connection between two nodes. 
In F2C, the connection will be given by the node at the higher hierarchical level. 
According to [9], three layers are identi?ed at early stage for the F2C system Although, 
the proposed three layer F2C system is not considering inter-service-provider interac- 
tion, therefore, we assume fourth layer such as follows: 
– Edge: This F2C connection provides all occurred connection among resources 
(physical devices or virtual entities) under the same fog node. The resources that form 
an area at the edge layer are located geographically closer to each other. For example, 
an area at the edge can be considered as a hospital building or a school. 
– Fog: The fog layer connection includes the connections among the fog nodes and the 
resources that they aggregate. An example of this connection layer can be a connec- 
tion between a sensor and another device grouped under di?erent fog nodes. 
– Cloud: This Connection layer includes all resource connections established by the 
same service provider. The main di?erence with the fog layer connection is that 
resources may be located geographically far from each other. For example, resources 
in di?erent cities connected by the same Internet Service Provider. 
– Global: This connection layer is that all connections among resources stand in global 
concept. In this context, the resources may or may not be located close to each other 
and thus inter service provider connectivity plays a key role. For example, connation 
between two smart cities provided by two service providers. 
Figure 3 presents the four described F2C connection layer and its borders speci?- 
cations. Since the number of layers in the F2C architecture may be changed, the set of 
F2C layer connection and the ID fragmentation policy may be changed as well to be 
126 A. Gómez-Cárdenas et al.
aligned properly with the number of F2C layers. Therefore, it is worth highlighting that, 
this is a simple approach. 
Fig. 3. Hierarchical F2C network connections. 
Once the all F2C connection has been de?ned for F2C network topology, we divide 
the resource identi?ers into n parts, where n is the number of F2C connections layer 
de?ned in the F2C system. Now, every time a connection between two nodes is estab- 
lished in the network, the nodes use a fraction of the identi?er rather than using the full 
identi?er for a mutual identi?cation. The number of fragments to be used in each 
connection depends on the node at the higher hierarchical level. For example, the F2C 
network topology illustrated in Fig. 3, the device (b) connects to the Fog node #2, such 
F2C connection will be set as Fog connection. Then, only two fragments of the global 
identi?er will be utilized during the identi?cation process. 
In fact, from F2C connection and topological perspective, nodes which are located 
at higher layers need to use more ID fragments, and consequently, the utilized ID during 
the connections with other nodes will be larger. The reason for this is that nodes in higher 
layers have more devices as child. Therefore, to be able to identify each of these devices, 
longer fragments in identi?ers will be required. 
Regarding the fragments of identities division, we mention that according to the 
di?erent use-cases and implementation needs, length of the fragments may be varied. 
The lower layer in a F2C system is the IoT layer. In the IoT layer, the length of the ?rst 
fragment would depend on the maximum number of resource IDs that a fog node can 
store in cache during a given period of time, that is, the identi?ers cache size. Large 
identi?ers cache sizes in the fog nodes also entail larger identi?er fragments. IoT devices 
might has limited resource characteristics, therefore, small cache sizes might be 
expected in this layer. Fog nodes can play a key role for adjusting the ID fragment length 
to collision problems do not arise. Collision problem in the naming are addressed in [16– 
18]. In the proposed identity management, a collision problem occurs when two or more 
resources in a F2C connection use the same identi?er. Thus, since the purpose of IDs is 
identifying unambiguously a resource, the collision probability must to be reduced. 
A Novel and Scalable Naming Strategy for IoT Scenarios 127
In order to enhance the IDMS security and privacy, the full resource identi?er is not 
propagated nor stored through the network but it is only known by: (i) the resource to 
which the ID belongs; (ii) the fog node as long as the resource is connected to the F2C 
network through it, and; (iii) other resources in a global connection that require the full 
resource ID for a proper identi?cation. In short, preventing collisions during the iden- 
ti?cation process is the reason that drives nodes in a global connection to use their full 
ID instead of a fraction of it. 
In our proposal, fog nodes play a key role because they perform IDs fragmentation 
and share the required resource ID fragments with other nodes according to the F2C 
connection layers in F2C systems. 
4 Evaluation and Results 
In this section we present the description of the experiment we used to validate our 
proposal and the results obtained. For the results, we have compared the storage required 
to store the resource identi?ers and the queries execution times when the resources use 
their full identi?er in the network and when they use a fraction of it, hence, two param- 
eters have been considered during the evaluation. 
In F2C, the resources grouped in the lowest layer of the network hierarchy will be 
the most challenging to identify. Such complexity is caused by the tremendous number 
of devices concentrated in the bottom of the network topology (user’s devices, sensor 
networks and other IoT artifacts), the lack of control that the service provider will have 
over those devices and the highly dynamic network topology caused by the inherent 
mobility of many devices. Thus, recognized the aforementioned as a fact, in this section 
we focus in the IoT layer, hence evaluating the performance of our proposal when using 
the ?rst ID fragment. 
4.1 Experiment Description 
In the conducted experiment, we have used a Raspberry Pi 3 model B. Such device 
integrates an ARM 1.2 GHz quad-core processor and 1G of RAM memory. The reason 
for using that device is that we consider its speci?cations as the minimum hardware 
requirements that a device should meet in order to be considered for the fog node role 
in the F2C system. 
The software we have preinstalled in the Raspberry Pi are Ubuntu 16.04 as Operating 
System and a SQL Database Management System (DBMS). Subsequently, we created 
?ve databases and ?lled them with a million of synthetic resource identi?ers. The length 
of the resource identi?er in the ?rst database was set to 128 bytes (according with the 
length used in [19]). This ?rst database was the one with the full identi?ers. In the next 
four databases a truncated version of the identi?ers in the ?rst database was stored. The 
IDs were truncated at 32, 16, 8 and 4 bytes respectively. In all the cases, the identi?ers 
were generated using only the hexadecimal charset. 
128 A. Gómez-Cárdenas et al.
4.2 Used Storage 
In F2C, the IoT layer is the one with the most limited resources. In fact, many of the 
devices that operates in the lowest layer do not even have the necessary hardware 
resources to process the data they generate, therefore, an e?ective resource management 
is a must. 
In this sense, the storage is one of the most constrained aspect of the devices in the 
IoT layer. A F2C framework that requires excessive storage capacity to store the data 
generated on runtime may disallow a large number of devices to be used as fog nodes, 
causing with this, in the worst scenario, that the existing fog nodes reject new connec- 
tions because they are overloaded. 
The storage required to store the resource IDs in the fog nodes is the ?rst parameter 
we have evaluated. The results obtained during the validation (Table 1) show that trun- 
cating the identi?er that the resources use in the IoT layer reduce the space in disk 
required to store them. 
Table 1. Database sizes 
Database Size (MB) % 
128 Bytes 162.17 100.00 
32 Bytes 67.09 41.37 
16 Bytes 51.08 31.50 
8 Bytes 42.08 25.95 
4 Bytes 37.06 22.85 
Table 1 shows the size in megabytes of the databases previously described. The 
column in the right presents the percentage of the space required by the truncated data- 
bases with respect to the database that stores the full resource identi?ers, it is, the 128 
bytes identi?ers. 
It can be highlighted from the table that the di?erence in megabytes between the 
databases with the identi?ers of 8 and 4 bytes is minimal, even when the identi?ers 
stored in the ?rst one are larger. This is rooted on the fact that the indexes that the DBMS 
uses are not in function on the length of the ?elds in the tables. 
In all the cases, the space in disk required to store the identi?ers fragments is between 
58.63% and 77.15% less than the space needed to store the full identi?ers. 
4.3 Queries Times 
One of the main advantages that the F2C paradigm o?ers is the possibility to execute 
applications and services with a reduced delay than cloud computing. This opens the 
door to the development and deployment of all kind of novel services that require real 
time responses, such as e-health services, online videogames, earthquake alarm triggers, 
etc. To achieve such goal, it is imperative that the individual components that integrate 
the F2C framework are highly e?cient and avoid adding delays in the internal processes. 
In the F2C framework, The IDMS component should be able to identify the resources 
in a time that allows to devices on the move to switch among di?erent fog nodes without 
A Novel and Scalable Naming Strategy for IoT Scenarios 129
interrupt its activities. Such identi?cation process includes the database lookup task. In 
this sense, our proposal aims at reducing the database lookup times, this by reducing the 
amount of information that the fog nodes store. 
In the validation phase, we have used the databases described under Sect. 4.1 to 
measure the lookup times. We have measured ten times the time required to fetch among 
200, 400, 600, 800 and 1,000 (thousands) records for each database and then we calcu- 
lated the averages of the obtained results (Table 2). 
Table 2. Queries execution times 
ID length IDs in the Fog Node (thousands) 
200 400 600 800 1,000 
128 Bytes 2.97 6.29 9.62 12.93 16.68 
100.00% 100.00% 100.00% 100.00% 100.00% 
32 Bytes 1.51 2.49 3.97 4.95 7.55 
50.87% 39.65% 41.27% 38.28% 45.24% 
16 Bytes 1.29 2.34 3.43 4.92 6.34 
43.30% 37.24% 35.67% 38.01% 38.01% 
8 Bytes 1.26 2.20 2.93 4.42 5.52 
42.51% 34.90% 30.43% 34.16% 33.10% 
4 Bytes 1.16 1.91 3.14 3.98 5.02 
38.99% 30.31% 32.62% 30.80% 30.11% 
Table 2 and Fig. 4 summarize the results obtained. For the sake of comparison, 
percentage related to the ?rst database are also included in Table 2. As it can be observed, 
using a fraction of the full resource identi?ers reduces signi?cantly the time required to 
search an item in the database. By using a quarter of the name of the devices, our proposal 
has shown a reduction of up to 49.13% in the search time. In fact, a 32 bytes ID is still 
Fig. 4. Queries execution times. 
130 A. Gómez-Cárdenas et al.
a large identi?er for the lower F2C layer, which means that the ID length can be reduced 
even more and with it, also the search time. 
It’s worth noting that in general, the times obtained when using 8 and 4 bytes iden- 
ti?ers are very similar. This means that the time behaves exponentially, what is justi?ed 
by the management of indexes and primary keys used by the DBMS to improve the data 
retrieval process. 
In Fig. 4, the queries execution times are presented graphically. The blue bars repre- 
sent the lookup times in the database that stores the full resource identi?ers. It can easily 
be observed that in all the cases the time required to search in such database are consid- 
erably longer than the queries execution times when the resources use a fraction of their 
full ID. In this ?gure, the exponential behavior of query execution times can be observed 
more clearly. This trend becomes more evident as the volume of data to be handled 
increases. 
From the results shown in the Table 2 and Fig. 4, we can conclude that when the 
edge devices use a fraction of their full identi?er instead of the full version of it, the 
lookup time decreases signi?cantly (between 54.76 and 69.89% for large volumes of 
data), all of this, without a?ecting the ID uniqueness property, it is, keeping a very low 
collision probability. 
5 Conclusions and Future Work 
The F2C compute paradigm have arose as a novel solution that intends both, to manage 
the resource continuum from the edge of the network to the cloud datacenter and to solve 
some of the cloud inherent limitations, such as the possibility of o?ering remote 
resources at the edge with a reduced latency to be used by delay sensitive services that 
require real time responses. However, there is still a list of open challenges that must be 
addressed before we can have a F2C framework that can be deployed. One of those 
challenges is the management of the resources identities in the network, especially, in 
the lower hierarchical layer, where most of those resources will be concentrated. 
In this paper, we propose a strategy to manage the identity of the resources that 
consists of fragmenting the unique global resource ID into smaller fragments. Each time 
a connection to a resource is established, the fog node that aggregates the resource to 
the network will determine the connection scope and thereafter, the number of fragments 
required for a mutual unambiguous identi?cation. 
The results obtained during the proposal validation phase show that the implemen- 
tation of our proposal allows to reduce both, the space in disk required to store the 
resource identi?ers in the fog nodes and the query execution times, achieving with this, 
a more e?cient use of resources in the IoT layer and streamline the resource identi?- 
cation process. 
Future work in this topic includes to implement this proposal in a real scenario to 
validate its e?ectiveness in the whole F2C environment and to propose an algorithm 
that allows to determine the optimal fragment lengths for each level in the network 
hierarchy. 
A Novel and Scalable Naming Strategy for IoT Scenarios 131
Acknowledgment. This work is supported by the H2020 mF2C project (730929) by the Spanish 
Ministry of Economy and Competitiveness and by the European Regional Development Fund 
both under contract TEC2015-66220-R (MINECO/FEDER), and for Alejandro Gómez-Cárdenas 
by the Consejo Nacional de Ciencia y Tecnología de los Estados Unidos Mexicanos (CONACyT) 
under Grant No. 411640. 
References 
1. Evans, D.: The Internet of Things: How the Next Evolution of the Internet is Changing 
Everything (2011) 
2. Burkert, A.: Modern Cars’ Insatiable Appetite for Data (2017) 
3. Mehdipour, F., Javadi, B., Mahanti, A.: FOG-engine: towards big data analytics in the fog. 
In: 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure 
Computing, 14th International Conference on Pervasive Intelligence and Computing, 2nd 
International Conference on Big Data Intelligence and Computing and Cyber Science and 
Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 640–646 (2016) 
4. Ferrer-Roca, O., Roca, D., Nemirovsky, M., Milito, R.: The health fog. Small data on health 
cloud. Presented at the International eHealth, Telemedicine and Health ICT Forum for 
Educational, Networking and Business, Luxembourg, 23 April 2015 
5. Firdhous, M., Ghazali, O., Hassan, S.: Fog computing: will it be the future of cloud 
computing? Presented at the proceedings of the third international conference on informatics 
and applications, Kuala Terengganu, Malaysia (2014) 
6. Masip-Bruin, X., Marín-Tordera, E., Jukan, A., Ren, G.-J., Tashakor, G.: Foggy clouds and 
cloudy fogs: a real need for coordinated management of fog-to-cloud (F2C) computing 
systems (2016) 
7. OpenFog Consortium: OpenFog Reference Architecture for Fog Computing, USA (2017) 
8. mF2C Consortium: mF2C Project. http://www.mf2c-project.eu/ 
9. Sarkar, S., Misra, S.: Theoretical modelling of fog computing: a green computing paradigm 
to support IoT applications. IET Netw. 5, 23–29 (2016) 
10. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE 
Internet Things J. 3, 637–646 (2016) 
11. European Telecommunications Standards Institute: Corporate telecommunication Networks 
(CN); User Identi?cation in a SIP/QSIG Environment (2004) 
12. Cao, J., Xu, L., Abdallah, R., Shi, W.: EdgeOS_H: a home operating system for internet of 
everything. In: 2017 IEEE 37th International Conference on Distributed Computing Systems 
(ICDCS), pp. 1756–1764 (2017) 
13. Nuñez, D., Agudo, I.: BlindIdM: a privacy-preserving approach for identity management as 
a service. Int. J. Inf. Secur. 13, 199–215 (2014) 
14. Chen, J., Liu, Y., Chai, Y.: An identity management framework for Internet of Things. In: 
2015 IEEE 12th International Conference on e-Business Engineering, pp. 360–364 (2015) 
15. Fu, Z., Jing, X., Sun, S.: Application-based identity management in M2M system. In: 2011 
International Conference on Advanced Intelligence and Awareness Internet (AIAI 2011), pp. 
211–215 (2011) 
16. Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., Hallam-Baker, P.: Naming 
Things with Hashes (2013) 
17. Bouk, S.H., Ahmed, S.H., Kim, D.: Hierarchical and hash based naming with Compact Trie 
name management scheme for vehicular content centric networks. Comput. Commun. 71, 
73–83 (2015) 
132 A. Gómez-Cárdenas et al.
18. Savolainen, T., Soininen, J., Silverajan, B.: IPv6 addressing strategies for IoT. IEEE Sens. J. 
13, 3511–3519 (2013) 
19. Gómez-Cárdenas, A., Masip-Bruin, X., Marín-Tordera, E., Kahvazadeh, S., Garcia, J.: A 
hash-based naming strategy for the fog-to-cloud computing paradigm. In: Heras, D.B., Bougé, 
L., Mencagli, G., Jeannot, E., Sakellariou, R., Badia, R.M., Barbosa, J.G., Ricci, L., Scott, 
S.L., Lankes, S., Weidendorfer, J. (eds.) Euro-Par 2017: Parallel Processing Workshops, pp. 
316–324. Springer, Cham (2018) 
A Novel and Scalable Naming Strategy for IoT Scenarios 133
The IoT and Unpacking the He?alump’s Trunk 
Joseph Lindley(?) , Paul Coulton, and Rachel Cooper 
Imagination, Lancaster University, Lancaster, UK 
{j.lindley,p.coulton,r.cooper}@lancaster.ac.uk 
Abstract. In this paper we highlight design challenges that the Internet of Things 
(IoT) poses in relation to two of the guiding design paradigms of our time; Privacy 
by Design (PbD) and Human Centered Design (HCD). The terms IoT, PbD, and 
HCD are both suitcase terms, meaning that they have a variety of meanings 
packed within them. Depending on how the practices behind the terms are applied, 
notwithstanding their well-considered foundations, intentions, and theory, we 
explore how PbD and HCD can, if not considered carefully, become He?alump 
traps and hence act in opposition to the very challenges they seek to address. In 
response to this assertion we introduce Object Oriented Ontology (OOO) and 
experiment with its theoretical framing order to articulate possible strategies for 
mitigating these challenges when designing for the Internet of Things. 
Keywords: Internet of Things · Privacy by Design · Human-Centered Design 
1 Introduction 
Although the term the Internet of Things (IoT) is employed regularly, particular in 
discussions relating to emerging technologies, its actual meaning is ambiguous as it is 
de?ned di?erently depending on who’s using it and in what context. Although it was 
preceded by other terms such as ubiquitous computing and pervasive computing it has 
gained traction with a general audience, perhaps because the terms ‘internet’ and ‘things’ 
are more accessible. However, having ambiguity baked in to the term means that ‘the 
IoT’ is likely to be interpreted di?erently dependent upon the meanings a particular 
individual might associate with these terms. This ambiguity means there is huge varia- 
tion within discourses utilizing the term. Although the research presented in this paper 
is aimed at contributing to practices relating to the design of IoT products and services, 
it also resonates with other, more general, discussions relating to emerging technologies. 
In particular it seeks to contribute to the debates about privacy, ethics, trust and security 
in the IoT [37] and understand potential barriers to adoption that may arise through the 
establishment of problematic design patterns. 
Our title is a play on the word trunk being synonymous with suitcase, and makes 
reference to Hyman Minsky’s term, suitcase words. These words describe complex 
concepts that, when one tries to de?ne them, reveal a nested series’ of other meanings 
contained within. The other odd term in the title, He?alump, refers a ?ctional elephant 
like creature, appearing in A.A. Milne’s books about Winne the Pooh. In one story Pooh 
and his friend Piglet decide to catch a He?alump in a cunning trap, unfortunately they 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 134–151, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_11
only succeed in trapping themselves. The irony of this story has given rise to He?alump 
Traps being used by political journalists to describe strategies in which a politician might 
set a rhetorical trap to catch their opponent and that ultimately back?res on the trapper, 
leaving them to appear foolish! Thus, despite their intentions, and often ?ne execution, 
He?alump traps fail to achieve their aims and instead are detrimental toward the desired 
outcome. In this paper we illustrate how the suitcase terms IoT, Privacy by Design (PbD), 
and Human Centered Design (HCD) can, become He?alump traps by virtue of their 
nested complexities. 
The paper is structured as follows. First, we discuss PbD, paying particular attention 
to the linguistic complications when trying to de?ne what it really means using the 
example of the ambiguity present in the European Union’s invocation of the term in the 
recently introduced (EU) General Data Protection Regulations’ (GDPR). Next, we 
discuss the challenge to the well-established paradigms of Human-Centered Design 
(HCD) resulting from the complexities introduced by networked nature of IoT products 
and services. Third we argue that, if interpreted hubristically, PbD and HCD can result 
in unintended consequences, and, in essence, become He?alump traps. Finally, we 
propose the use of new design research techniques incorporating concepts derived 
contemporary philosophies of technology that can be used to develop and test strategies 
when navigating the complexities of the IoT and thus to minimize the risk of becoming 
caught in a He?alump trap. 
2 Privacy by Design (and This by That) 
It is important to start this discussion by acknowledging that PbD does not exist in 
isolation; there are other propositions which overlap with it such as privacy, security 
and/or data protection by default. The semantics of the terms use does not aid our under- 
standing; for example, con?guring something by default would not the same as creating 
something in a particular way, or put di?erently, by design. Although, for something to 
have a default con?guration implies that it must have been designed that way. Adding 
to this confusion is the fact that in English language the word ‘design’ can be used in a 
multitude of di?erent way to mean very di?erent things, e.g. the designer uses her/his 
knowledge of design to design a thingamajig, which was part of the ?nal system design 
(which was built in accordance with the original design schematic). It was perhaps 
inevitable for confusion to result when the terms appeared in an in?uential report in the 
form “incorporates Privacy by Design principles by default” [6]. 
The already murky waters that contain PbD are made more di?cult to navigate when 
we introduce the complex abstractions like ‘privacy’ and ‘security’. To unpack these 
very quickly: privacy is not the same as security, but in some circumstances, privacy 
may be delivered by security and conversely security may be delivered by privacy. It is 
also evident that disciplinary idiosyncrasies can also come into play when trying to bring 
some clarity to a particular situation. For example, an engineer may interpret security 
operationally in terms of a particular implementation, like access control lists, whereas 
a psychologist may draw their understanding from a psychological theory, such as 
Maslow’s hierarchy of needs. While both considerations are equally valid even when 
The IoT and Unpacking the He?alump’s Trunk 135
their epistemological roads intersect, a common understanding will not necessary 
emerge. These de?nitional complexities are not, in themselves, anything to do with how 
one delivers PbD, they must be acknowledged within any critical discussion. Whilst the 
argument in this research is relevant to wider discourses of emerging technology, 
primarily the speci?c issues we are concerned with are (1) Privacy by Design [6] and 
(2) Data protection by design and by default as referred to in article 25 of the GDPR [42]. 
Whilst the term PbD emerged originally in a 1995 report1 it came to prominence in 
2012 through the work of Ann Cavoukian and Je? Jonas [6]. Introducing PbD Cavoukian 
quotes the words of a 13th century Persian poet who posits that to ‘reinvent the world’ 
one must ‘speak a new language’. The premise is that technological progress is itself a 
new language that brings with it fundamental challenges to the notion of privacy. Going 
on to provide more concrete examples, the report describes the use of a one-way hash 
function to protect data subjects’ privacy so that even if patterns can be observed in the 
data, it cannot be reverse engineered to reveal the names of the participants. While this, 
and the other examples provided are compelling they are arguably a little naïve. 
Although in particular contexts such approaches can protect the privacy of individuals 
represented in the data in the increasingly heterogeneous contexts the IoT represents 
they can be extremely vulnerable to exploitation through amalgamation with other, 
seemingly unconnected, data sources and complete reliance on them could prove detri- 
mental. In the report Cavoukian builds upon the technical contribution of Je? Jonas to 
propose seven principles for the creation of systems that are private by design. These 
include: 
• Full attribution of each data record; 
• Data is tethered (any changes to data are recorded at the time of change); 
• Analytics only occur when data has been anonymized; 
• Tamper-resistant audit can be performed; 
• Systems are created that tend towards false negative rather than false positive in 
borderline cases; 
• Self-correcting conclusions (conclusions can be changed based on new data anal- 
ysis); 
• Information ?ows are transparent (data movements should be trackable and traceable 
—whether that is through a hard copy, appears on monitor, or is sent to another 
system). 
These principles are aimed at what the report refers to as ‘sense making systems’, 
systems that synthesize data from multiple systems such as payroll, customer relation- 
ship management, ?nancial accounting, in order to reach new work?ow conclusions. 
While the principles make some sense within the bounded context described, they are 
regrettably too speci?c to become generally applicable to the heterogeneous user groups 
and devices found within the IoT. 
In her discussion of PbD Sarah Spiekermann notes “Data is like water: it ?ows and 
ripples in ways that are di?cult to predict” [33], the implication being that PbD is rather 
1 
http://www.ontla.on.ca/library/repository/mon/10000/184530.pdf. 
136 J. Lindley et al.
idealistic and when implemented in practice can be as simple as the utilizing Privacy-
Enhancing Technologies with additional security, with the aspiration being an appa- 
rently “fault-proof” system. Although such an aim is worthy, and the approach is valid, 
as she states, “the reality is much more challenging”. Spiekermann problematizes this 
idealism by re?ecting business models of Google and Facebook. They provide a range 
of apparently ‘free’ services but “without personal data such services are unthinkable”. 
She argues that proponents of PbD “hardly embrace these economic facts in their 
reasoning”. In other words, it may not be possible to create feature rich systems that are 
pro?table for the companies that supply them without contravening some of PbD’s 
fundamental ideals. 
In Cavoukian’s response, whilst broadly agreeing with Spiekermann’s analysis, she 
also insists “the challenges of PbD are not as great as Spiekermann suggested; the engi- 
neers I have met have embraced the PbD principles, ?nding implementation not di?cult” 
[5]. Whilst this may be true, it somewhat misses the more interesting element of Spie- 
kermann’s analysis which touches on potentially systemic shortcomings at the core of 
PbD’s rhetoric: a ‘fault-proof’ landscape is unrealistic when the ‘economic facts’ of 
many business models are not acknowledged. Spiekermann’s critique highlights that to 
do PbD e?ectively, it must become part of overall organizational culture, cutting across 
management, ?nance, marketing, design and engineering. This is perhaps the reason 
behind why PbD stagnates, and struggles to move from principles to practicalities— 
particularly in consumer goods. An alternative perspective on this echoes Shapiro’s 
suggestion that neither engineers nor customers are able to properly articulate, under- 
stand, or analyze the impact of ‘non-functional’ requirements like privacy [32]. These 
hard-to-grasp requirements operate at a completely di?erent level of abstraction to what 
either engineers and customers are accustomed to thinking about. 
To recap, the new language of technology is making our world anew, but, we are 
not yet ?uent in this emerging language. While purely technical responses to privacy 
sometimes appear to o?er faultless solutions (e.g. processing irreversibly hashed data), 
rarely will such a solution be generalizable across a range of contexts. While principles 
of PbD appear to be useful mechanisms they can be easily compromised when the 
complexities of ‘in the wild’ contexts are encountered. Whilst we are not disputing that 
PbD has demonstrably helped inform the delivery of privacy-aware projects with buy-in 
from developers, customers, and management alike, such examples appear to be in 
very speci?c contexts and do not necessarily cut through the aforementioned issues. 
Although the rhetoric deployed for PbD hints at the practicality of creating a ‘fault-proof’ 
approach to privacy this fails to appreciate the economic realities of what 
currently makes data-centric businesses viable. 
On the 25th May 2018 when GDPR became active the data protection legislation 
across a large swathe of Europe immediately changed. As GDPR protects citizens 
regardless of where the data pertaining to them is being held, it has also impacted on 
any organization who holds data about European citizens. We are yet to fully understand 
how GDPR will play out in practice, test cases and precedents will need emerge before 
its full implications are understood. Notwithstanding this uncertainty, GDPR is being 
cited as a legal framework that will clarify and enforce PbD, because article 25 of GDPR 
explicitly mentions Data protection by default and design [40]. The opening words of 
The IoT and Unpacking the He?alump’s Trunk 137
the article say that data controllers must take “the state of the art” approaches of PbD 
into account however no indication is given to what state of the art might mean in practice 
[14]. Given that this assertion is made under the heading ‘data protection by design and 
default’ we might reasonably infer that there is a relationship between the two, although 
the nature of that relationship is unde?ned. Article 25 also makes reference to the ‘by 
default’ trope, stating that appropriate measures should be taken to ensure that by default 
“only personal data which are necessary for each speci?c purpose of the processing are 
processed”. Thus, it appears that GDPR’s interpretation of data-protection by design, 
and relatedly by default, is at best ambiguous and certainly does not progress our under- 
standing of how to e?ectively operationalize the rather abstract principles of PbD. This 
lack of speci?city with respect to PbD (and its relatives) is not con?ned to the document 
de?ning GDPR. The UK Information Commissioners O?ce (ICO) which is the UK 
organization responsible for interpreting and enforcing GDPR calls on data controllers 
to utilize PbD, but does not pro?er any guidance as to how this may be practically 
enacted.2 
While the de?nitional challenges facing European regulators are undoubtedly 
signi?cant, by including the terminology within the text of GDPR without attending to 
PbD’s inherent ambiguity, further challenges are almost certainly abound. 
3 Human-Centered Design 
In his book The Design of Everyday Things [27] Don Norman presented principles for 
designing ‘things’ in such a way that human interaction with them is smooth and fruitful. 
Until relatively recently such interactions tended to occur predominantly between users, 
things and/or systems that were standalone and self-contained. In the book Norman 
provides numerous examples including a refrigerator, a telephone, and a clock. Despite 
the fact that some of his examples, such as the telephone, depend upon several technol- 
ogies interacting across a diverse technical infrastructure, the user experience of using 
the phone is encapsulated within a discrete interface made up of handset, dialer, and 
ringer. Today, interactions occur in much more complex contexts which present 
designers with new challenges. The “networki?cation of the devices that previously 
made up our non-Internet world” [29] is creating the IoT and while, interactions with 
these devices may appear familiar on the surface they inevitably produce an associated 
digital residue. This digital residue is data, and in stark contrast to the “visibility, appro- 
priate clues, and feedback of one’s actions” that Norman highlights as key properties of 
HCD [27:8–9] the full impact of the data is rarely visible either during or after actual 
user interactions (with connected, or IoT, devices). While this data is necessary to 
support business models, to train algorithms and, ultimately, to make stu? work, it is 
possible that by obscuring agency of underlying data, models and algorithms at the point 
of interaction, designers are in fact operating against the underlying ideology of HCD. 
The foundations of HCD are in ergonomics with the aim of supporting the “ways in 
which both hardware and software components of interactive systems can enhance 
human-system interaction” [43]. Despite being demonstrably useful [2, 16] this engi- 
neering derived paradigm relied on simpli?cations of complex contexts [11, 13, 38]. 
2 
https://ico.org.uk/for-organisations/guide-to-data-protection/privacy-by-design/. 
138 J. Lindley et al.
These reductive stances are incompatible with other more modern approaches that have 
become integral to HCD and acknowledge “the coherence of action is not adequately 
explained by either preconceived cognitive schema or institutionalized social norms” 
[36:177]. The result is that HCD methods have become extremely diverse, build upon 
a variety theoretical and epistemological stances, and are applied variously as both an 
evaluative and a generative tool [13, 23, 34]. The spectrum of approaches to utilizing 
HCD now includes methodological assemblages that can draw upon ethnography, 
participatory design, cultural probes, workshop techniques, scenarios, extreme users, 
and personas. Applied sensitively these techniques can produce designs that are “phys- 
ically, perceptually, cognitively and emotionally intuitive” [13], while also matching 
“the needs and capabilities of the people for whom they are intended” [27:9]. Whilst it’s 
true that “there is no simple recipe for the design or use of human-centered computing” 
[17], HCD—particularly among the design research community—has become ubiqui- 
tous is greatly in?uence on the technologies that concurrently we shape, and then ulti- 
mately shape us. 
Even amongst this diverse methodological landscape, a core theme that pervades 
HCD utilization is the axiom of simplicity. This is oft interpreted to mean that HCD 
should inform the design of services and software that are e?cient, e?ortless, and 
edifying to use; that fade into the background becoming invisible, and that ensure any 
complexity is that of the underlying task and not of the tool that has been developed to 
achieve it [25:197, 26]. Norman himself acknowledges that dogmatically blunt inter- 
pretations of this simplicity axiom can, perhaps unsurprisingly, introduce unintended 
consequences that drive HCD towards a “limited view of design” and result in analysis 
preoccupied with narrowly focused “page-by-page” and “screen-by-screen” [24] eval- 
uations. This narrow focus can sti?e potential users, and/or researchers, form being able 
to fully intuit a particular designed ‘thing’ on a crucial cognitive, emotional, and percep- 
tual level. In the hyper-connected and data-mediated assemblages of the IoT, the prev- 
alent assumption that simpler-is-better is already proving highly problematic as the 
recent revelations concerning Facebooks use of data illustrate. While some aspects of 
HCD are worthy and hold fast, the complexity, ubiquity, and interconnectedness of 
systems—represented by the IoT—means that HCD needs to be reevaluated. In the age 
of the IoT, whilst we need to re?ect the human centered ideals of HCD, it may be 
necessary to accept that there are, e?ectively, multiple centers and actants relevant to 
any given interaction. 
4 Hubris and He?alumps 
The common thread that connects the previous discussions of PbD and HCD relates to 
the risk that occurs when their principles are interpreted hubristically; with excessive 
self-con?dence. To illustrate this, take a moment to think about the story of the 
Titanic. The ship employed cutting edge technology in an e?ort to make as safe as 
possible and was famed for being ‘unsinkable’. As well as explaining a lack of lifeboats 
on board, this in?ated con?dence meant that even though a spotter saw the iceberg in 
good time, the helmsman was never asked to take avoiding action—if the ship is 
The IoT and Unpacking the He?alump’s Trunk 139
unsinkable, why avoid a sinking hazard? After the tragedy the owners were accused of 
using misleading rhetoric about her sinkability, in response they pointed out their claim 
was only that the ship was designed to be unsinkable (as opposed to actually being 
unsinkable). The tale of the Titanic illustrates that hubristic reliance can, if circumstances 
conspire, be extremely dangerous. 
Relying on supposed guidelines and principles for HCD and PbD is, arguably, 
equivalent to the Titanic’s relying on cutting edge anti-sinking technologies. Hence, we 
cast HCD and PbD as potential He?alump traps. By solely relying on these approaches 
—despite their unequivocal worthy aims and demonstrated practical virtues—technol- 
ogists may inadvertently end up ensnaring themselves by the very issues that HCD or 
PbD may have sought to avoid (see Fig. 1). The problem, in many ways, is with binary 
and didactic positions. Describing ships as unsinkable, systems as private, or designs as 
human centered—is irrational. The results of such irrational beliefs may, at worst, result 
in tragedies like the Titanic. The IoT is so pervasive that the scope of resulting impacts 
range from the relative inconsequence of the Mirai botnet taking down Net?ix, through 
to the destabilization of national infrastructure and potential dissolution of democratic 
processes. 
Fig. 1. Depiction of a He?alump Trap. 
If treated insensitively, ideals like PbD and HCD may coerce technologists to believe 
that privacy is something that can be ‘achieved’ and a system’s simplicity is analogous 
to being ‘human centered’. Notions of apparently perfect systems are as dangerous as 
considering a ship unsinkable; these positions are misconceptions. Ship captains, system 
developers, and He?alump trappers alike; be careful. Don’t suggest your ocean liner is 
140 J. Lindley et al.
unsinkable, don’t believe your door-lock is uncrackable, don’t attempt to trap the made-up 
animal—refrain from assuming that it might be feasible to design a computerized 
device that is perfectly private by design. Do, however, embrace those driving ideals, 
just with a healthy skepticism towards the hubristic tendencies. In the following we 
describe theoretically-informed strategies to mitigate the dangers of hubris and He?a- 
lumps. 
5 Tempering the Hubris; Designing a Philosophical Response 
5.1 Object Oriented Ontology 
In the following we introduce Object Oriented Ontology (OOO), a modern philosophy 
which can help to make sense of the complex heterogeneous contexts emerging from 
the IoT that are so problematic for PbD and HCD. This framework is enacted with a 
contemporary speculative design methodology, Design Fiction [7, 19], to develop 
responses to the problematic aspects of PbD and HCD’s He?alump traps. We are not 
scholars of philosophy; hence we do not intend to discuss the nuances of OOO’s place 
within the broader gamut of philosophy and theory. However, in order to add some 
context in the following we o?er a short introduction to OOO, speci?cally within the 
context of computing and HCD. 
Philosophically underpinning HCD’s simplicity axiom in studies of Human-
Computer Interaction, Heidegger’s seminal Being and Time argues most objects and 
tools make most sense in relation to human use. Heidegger uses a hammer as an example, 
he says that technologies are either ‘ready-to-hand’ (in their normal context of use) or 
‘present-at-hand’ (if the ‘norm’ is disrupted, for example if the head fell o? the hammer). 
The metaphysics of this distinction are fascinating, but the salient issue is that the 
hammer comes to ‘Be’ through interaction with a human. As such the hammer’s very 
existence is the product of a correlation between the human mind, and the physical world 
[3]. This conceptual con?guration described as ‘correlationism’ [15]. What OOO does 
di?erently is to reject correlationism, and by doing so creates the possibility that objects 
have realities that are independent from human use and the mind/world correlation. Seen 
this way anything from a ?ber optic cable, to a blade of grass, to a quantum computer, 
to an apple pie—may be given agency in its own ontological limelight. If we imagine 
that every individual concept—the ?ber cable or the blade of grass—giving o? a little 
light in this way, then we might say their collective hue is the “?at ontology” that scholars 
of OOO refer to [4]. 
“In short, all things equally exist, yet they do not exist equally […] This maxim may seem like a 
tautology—or just a gag. It’s certainly not the sort of quali?ed, reasoned, hand-wrung ontolog- 
ical position that’s customary in philosophy. But such an extreme take is required for the curious 
garden of things to ?ow. Consider it a thought experiment, as all speculation must be: what if 
we shed all criteria whatsoever and simply hold that everything exits, even things that don’t? 
[…] none’s existence fundamentally di?erent from another, none more primary nor more orig- 
inal.” [3:11] 
Bogost uses the famously ill-fated video game E.T. the Extra-Terrestrial as an example 
of how a single thing can be broken into many di?erent types of OOO object. He notes 
The IoT and Unpacking the He?alump’s Trunk 141
that the game is simultaneously: a series of rules and mechanics; source code; source 
compiled into assembly; radio frequency signals; a game cartridge; memory etched on 
silicon; intellectual property; arguably ‘the worst game ever made’; a portion of the 
728,000 Atari games that were once buried in the ground in New Mexico;3 
a conglom- 
erate of all of these. There is no fundamental thing which de?nes The E.T. video game. 
Instead it is all of these things simultaneously, and all of them independently of any 
human interaction. Contemplating what this sort of shift in ontology could mean Bogost 
muses “the epistemological tide ebbed, revealing the iridescent shells of realism they 
had so long occluded” [3]. 
This branch of metaphysics may seem very far removed from the development of 
technology, however, through a more practically-oriented approach known as Carpentry 
it can be materialized. Carpentry involves the creation of “machines” that attempt to 
reveal clues about the phenomenology of objects. While it’s accepted that objects’ 
experiences can never be fully understood, the machines of carpentry act as proxies for 
the unknowable. They pro?er a “rendering satisfactory enough to allow the artifact’s 
operator to gain some insights into an alien thing’s perspective” [3:100]. Sometimes 
achieved through programming, and sometimes through other practice, “through the 
making of things we do philosophy” [41]—lending the theory a material tangibility is 
the kernel of Carpentry. The purpose of Carpentry is to give the otherwise ethereal study 
of ontology a very practical legitimacy: 
“If a physician is someone who practices medicine, perhaps a metaphysician ought be someone 
who practices ontology. Just as one would likely not trust a doctor who had only read and written 
journal articles about medicine to explain the particular curiosities of one’s body, so one ought 
not trust a metaphysician who had only read and written books about the nature of the 
universe.” [3:91] 
5.2 Design Fictions 
All design usually seeks to change the current context, and thus to create futures by 
answering questions or solving problems [22]. Speculative design is somewhat di?erent, 
it uses design to pose questions about possible futures, rather than to answer them.4 
This 
family of design practices does not aim to create products for market, or which solve a 
real problem, instead they use the traditions of design in order to elicit insights and 
provoke new understandings [1, 8, 9] (a stance that is central to ‘Research through 
Design’ [10, 12]). The speculative design landscape is quite broad5 however the speci?c 
approach we employed in this work is Design Fiction. 
There continues to be much disagreement about the ‘best’ ways to do Design Fiction, 
but the ‘Design Fiction as World Building’ approach [7] is the one we adopted with this 
work. Doing Design Fiction this way involves designing a series of artifacts which all 
3 
4 
cf. https://en.wikipedia.org/wiki/E.T._the_Extra-Terrestrial_(video_game). 
“A/B” is an excellent keyword based summary of the contrast between a?rmative and spec- 
5 
ulative design [30]. 
Dunne and Raby’s book [9] provides a thorough overview of speculative design practice and 
Tonkinwise’s review of the book o?ers some useful critique of speculation tooå [39]. 
142 J. Lindley et al.
contribute to the same ?ctional world. Individual artifacts act as ‘entry points’ in to the 
?ctional world by depicting parts of it at a range of di?erent scales (Fig. 2). This results 
in a reciprocal prototyping e?ect; the artifacts de?ne the world, the world prototypes 
the artifacts, which, in turn, prototype the world. 
Fig. 2. Design Fiction as World Building 
We utilize Design Fiction this way in a form of Bogostian Carpentry. In Bogost’s 
examples he explores the inner world of objects by using computer code. The ?exibility 
of code allows him to, e?ectively, ‘play God’ within that realm. The demiurgic quality 
a?orded Bogost by using computer code also exists when building Design Fiction 
worlds. However, instead of functions, APIs and code of the computer’s domain, it is 
the essence of Design Fiction worlds—and the designed things that de?ne them—that 
are the tools of this particular creationist trade. 
The World’s First Truly Smart Kettle. Employing the world building approach, we 
attempted to enact Bogostian carpentry in the design of a smart kettle—the kettle is 
branded as Polly, in reference to the nursery rhyme Polly Put the Kettle On. The contours 
of Polly’s world are crafted through the creation of various artifacts, including a ?ctional 
press release for the kettle, packaging materials, and user interfaces. The press release 
describes many of the kettle’s features, these include smart noti?cations, integration 
with social media, voice commands, energy tracking, location-based boiling, and the 
trademarked JustRight smart ?ll meter. Some of these features are prototyped in user 
interface designs (e.g. Fig. 3) and the artifacts aim to provide historical context to the 
Polly world too: the product was originally crowdfunded before subsequently being 
bought out by Amazon’s IoT division; it is regulated by a government organization, and 
in order to achieve its accreditation it must utilize the Minimum Necessary Datagram 
Protocol [cf. 20, 22]. 
The IoT and Unpacking the He?alump’s Trunk 143
Fig. 3. Polly’s OOO-inspired timeline and volumetric data graph. 
When building Polly’s ?ctional world we built from the assumption that continuing 
IoT adoption will result in even more ubiquity of data collecting devices [35]. Among 
these, presumably devices such as kettles will (continue to) collect data too. Today, the 
visibility of the data shared by these devices is at best opaque and at worst absent, 
isolating the user from the underlying data transactions. While PbD principles can 
protect the user from unwanted or nefarious processing of their personal data, on occa- 
sions where that sort of processing is part of the to facilitate the device’s functional 
requirements, the best alternative would be to communicate the nature of the data trans- 
actions rather than disguising them. We may liken this to an autonomous car that would 
choose an optimized route to its destination. Most of the time routing designed to reduce 
journey times are desirable but if the car was designed in such a way that it would not 
reveal precisely what that route was, it would likely engender a feeling of distrust. 
Responding to this need we constructed two key features in Polly’s ?ctional world. 
Figure 3 (left) shows timeline depicting events taking place over the course of a day. 
From the timeline, we can tell that, in data terms, Polly was dormant for over 4 h since 
the ‘daily cloud pingback’, which uploads usage data to the cloud and downloads 
con?guration, security, and update data from the cloud. We can also see Polly was 
removed from its base, partially re?lled, at which point the kettle’s software anticipates 
it may be boiled soon. We can see that removing the kettle from the base and re?lling 
it result in immediate sharing of data to the cloud. The anticipation event however does 
not share data to the cloud but does share data with the home’s smart meter and other 
appliances to inform them of an impending power-consumption spike. 
The righthand side of Fig. 3 depicts the volume of the data uploaded from Polly, 
downloaded to Polly, and moving around the local network. This display di?ers from 
the timeline in that we cannot tell from it why data is moving around. However, what 
we can tell is the relative amount of data this smart kettle consumes and generates, as 
well as the relative volume of those transactions. Both displays are intended to be used 
in conjunction with each other such that Polly is quite transparent about to what it 
communicates and for what purposes. Based on the examples we can infer that Polly 
downloads much less data than it uploads. The speci?c reason for the upload/download 
disparity is not important, rather the takeaway point is that by utilizing Carpentry and 
Design Fiction, considering the reality of the kettle itself and giving the kettle’s Object 
144 J. Lindley et al.
Oriented perspective as much weight as the user’s perspective and the manufacturers 
perspective, a more egalitarian interface can be designed that doesn’t detract from the 
usability forwarded by HCD or the privacy credentials of PbD, but that does reveal the 
reality of what is happening and why, thus detracting from the dangers of hubris. 
Orbit, a Privacy Enhancing System. This project was in part motivated to explore 
how the European Union’s GDPR may impact on user/technology interactions. We were 
minded to develop a system that could obtain GDPR-compliant consent in a modern, 
simple and transparent way. Although legal precedents are yet to be tested and estab- 
lished in court, the articles of the GDPR theoretically protect various rights including: 
the right to be aware of what personal data is held about an individual; the right to access 
personal data; the right to rectify inaccurate data; the right to move personal data from 
one place to another; the right to refuse permission for pro?ling based on personal data; 
the right that any consent obtained relating to personal data must be veri?able, speci?c, 
unambiguous and given freely. 
The process by which users consent to have their data collected and processed is an 
area of particular contemporary relevance. The alleged involvement of British marketing 
company Cambridge Analytica in Donald Trump’s election victory and how, if this is 
shown to be true, consent was gained for the collection and processing of data from 
Facebook, is one factor driving interest in consent. Although some advances have been 
made in recent years—for example pre-checked boxes and non-consensual cookie usage 
were both outlawed in Europe in 20116 —tick boxes for users to indicate they have 
understood and agree to conditions of use are still the norm. There are fundamental 
problems with this approach, the most obvious of which being that while users often 
tick boxes saying they have read terms and conditions, the tick is no indication of whether 
they have actually read the text, nor whether they have understood it. In one study only 
25% of participants looked at the agreement at all, and as little as 2% could demonstrate 
comprehension of the agreement’s content [28]. User agreements that obtain a wide 
spectrum of consent, whereby a user gives all the permission a device or service could 
ever possibly need, sti?e users’ agency to be selective about which features of a system 
they would like to use (which in turn seems to contravene the GDPR-protected right for 
speci?c and unambiguous consent). These systems also fail to account for changes over 
time; once consent has been gained it is frequently impossible (or very di?cult) to 
remove or change the nature of the consent. 
Again using the Design Fiction world building approach, we decided to use an IoT 
lock device to build the world around. Inspired by IoT locks that already exist on the 
market7 the ?ctional lock was imbued with the following features: 
• Using short-range radio instead of a key; 
• Location-based access (geofencing); 
• Temporary access codes (for guests); 
• Integration with voice agents (e.g. smart assistants); 
• Integration with other services such as If This Then That (IFTTT). 
6 
7 
http://www.bbc.co.uk/news/world-europe-15260748. 
cf. http://uk.pcmag.com/surveillance-cameras/77460/guide/the-best-smart-locks-of-2017. 
The IoT and Unpacking the He?alump’s Trunk 145
Each feature has a di?erent relationship with collected data, where data is stored, and 
how it is processed. Using a short-range radio (NFC) instead of a key only relies on data 
inside the users own network; location-based access requires that data be accessed and 
stored by the lock company; utilizing services like IFTTT would lead to data being 
shared with any number of 3rd parties. Given that our purpose was to explore GDPR-compliant 
consent mechanism, our crafting of the Design Fiction only paid brief atten- 
tion to the technical implementation (we assumed that the lock would utilize an IoT 
radio standard such as ZigBee and that suitable APIs facilitate integration with external 
services such as IFTTT). 
Our original aim with this project was to design a map that could be used during a 
consent procedure to show to a user what data goes where so that they would be 
“informed by design” [21]. However, this aim was immediately challenged by the vast 
number of possible variations, even within a relatively small and straightforward IoT 
context. Figure 4 illustrates a scenario with an IoT lock which has been con?gured to 
turn on a smart lighting system when the user opens their door. While the cause and 
e?ect are simple and clear to the user (opening the door makes the lights turn on), there 
actually several cloud-based services behind the scenes that are necessary to make the 
hardware work. There may also be unknown 3rd parties using the data too (e.g. data 
brokers). Hence, to turn this into a map that details precisely where data goes, when, 
and in what circumstances, is simply not possible. A signi?cant factor driving this chal- 
lenge is that each speci?c situation needs to be treated as an ad hoc scenario, as something 
completely unique [31]. 
Fig. 4. Diagram showing how a user opening the door may trigger a number of possible data 
?ows around the constellation, and that there is no single end point. 
In order to progress some the design parameters had to be amended. Initially we 
made our investigation more tightly scoped, rather than addressing GDPR compatibility 
per se, we focused solely on personal identi?ability. Next, it was necessary to forget the 
146 J. Lindley et al.
reducible concept of a map that would represent speci?c and quanti?able measures of 
probable risk and accept that any map would require much more extensive use of ‘shades 
of grey’. As a result of these changes our experiment with OOO went in directions we 
had not predicted. 
While our original intention was that OOO’s tiny ontologies would provide us with 
means to investigate the lock, the associated data streams, and potential users. Our 
attempt at carpentry, we thought, would lead us to have a deeper understanding of those 
objects directly. Contrastingly, however, what came to pass is that our carpentry resulted 
in the creation of an entirely original object (complete with its own tiny ontology). The 
purpose of this new object is to provide a new lens for looking at collections of IoT 
devices, platforms, the data that mediates between these, and the people that use them. 
These new objects—referred to as Orbits—communicate the relative likelihood that 
a person may be identi?ed based upon on device use. They present this in a fashion that 
distinguishes between data held locally, with known providers, or with unknown 3rd 
parties. These ‘maps’ provided some means to bridge between the vast gamut of possi- 
bilities in the computer-world and the succinct concreteness of judging acceptability in 
the human-world. They facilitate value judgements. 
The privacy Orbits map IoT systems, the data they utilize, and communicate the 
likelihood of identi?ability based on data held in di?erent places. The ‘levels’ (i.e. each 
concentric circle) represent data that is held locally, with known providers, or with 
unknown 3rd parties (see labels in Fig. 5). The de?nition (blurriness or sharpness) at the 
edge of each level describe the probability, or certainty, of the user being identi?able 
based on the data at that speci?c level. If the inner-most level has a pin-sharp edge, then 
it is almost de?nite that the user could be identi?ed based on those data (e.g. the right-hand 
diagram’s 1st level in Fig. 5). Blurrier levels mean that the chance of identi?ability 
is reduced (e.g. the left-hand diagram’s 3rd level in Fig. 5). 
Fig. 5. Example identi?ability Orbits (the name ‘Orbit’ stems from a visual similarity to the 
diagrams used in the Bohr model of the hydrogen atom (https://en.wikipedia.org/wiki/ 
Bohr_model)). 
The IoT and Unpacking the He?alump’s Trunk 147
The Design Fiction world we had created was a useful tool to then import the iden- 
ti?ability Orbits into, and to prototype how they might be used. We created a short ?lm 
that shows a user installing a new IoT smart lock device in their home8 using a voice 
interface and a supporting app. In essence the user is provided with a slider which enables 
or disables all the possible functions of the lock, the Orbits communicate how the asso- 
ciated changes in data ?ows impact on identi?ability. 
The same scenario may be extended to show the implications of dynamically modi- 
fying settings, for example to temporarily provide access to a delivery agent using a 
system similar to Amazon Key.9 
If the user has con?gured their system for maximum 
privacy (or, minimal identi?ability) then Orbits could be used to temporarily provide 
access to the 3rd party and to show the user what the impact on data ?ows would be. 
Though this interaction is clearly achievable, it raises a host of other questions relating 
to the temporality of consent. For example, if a user gives consent for their data to be 
used by a 3rd party for a few hours, what happens to that data after those hours have 
elapsed? 
6 Discussion and Conclusions 
Our OOO-informed Design Fictions work within boundaries of the following senti- 
ments: “the Internet must be grasped in metaphorical terms” [29] and that “Security by 
design and privacy by design can be achieved only by design. We need a ?rmer grasp 
of the obvious” [32]. Of course, acting on such sentiments is easier said than done, 
particularly when each of the constructs that we deal with—IoT, PbD and HCD—are 
all suitcase terms with multiple possible meanings. Because of this network of prob- 
lematic aspects, we assert that drawing on philosophy, and employing speculative 
design, is a productive way to begin to unpack the problem (as opposed to more directly 
applied/engineering-led approaches). The examples we have provided above are 
intended to be used in two ways. First, we wish to forward the method itself: enacting 
Bogostian Carpentry as a way of practicing OOO to address the complexities of PbD 
and HCD in an IoT context. This conclusion is relatively straightforward; we invite other 
researchers and technologists to apply a similar method and in doing so research the 
concepts further. Second, using Design Fiction as a method of Research through Design 
[10, 12], we o?er the following primary contributions which may be directly applied by 
technologists. 
Augmenting HCD with Constellations. Our critique and exploration of HCD is not 
meant unkindly. We acknowledge and applaud the rich history that HCD has, and rather 
than calling out shortcomings we wish to augment it for the 21st century. Thus, we 
propose the ‘Constellation’ design metaphor. This is a wrapper for the complexities of 
OOO and calls upon designers, developers and analysts to understand and acknowledge 
multiple di?erent perspectives in their products. Just as the constellations in the night 
8 
9 
https://youtu.be/A37SmnNFstA. 
https://www.theverge.com/2017/10/25/16538834/amazon-key-in-home-delivery-unlock-door-
prime-cloud-cam-smart-lock. 
148 J. Lindley et al.
sky appear di?erent depending on where you stand, the constellations of devices, data, 
networks, and users of the IoT appear di?erent depending on whom you are. Rather than 
obfuscating this complexity, interfaces such as those exempli?ed in Polly and Orbit, 
should communicate and reveal the complexity so as to inform all parties of any relevant 
others’ interests, activities, and agency. In doing so, the otherwise well-developed tools 
in HCD’s toolbox, may be utilized and leveraged, in order to produce technologies that 
deliver on the promise of the IoT without compromising users’ interests. 
Humbling the Hubris; Toward Informed by Design. Precisely echoing our explora- 
tion of HCD, the perspective we present on PbD is not a scornful one. However, we 
cannot escape that the temptation to use guidelines and principles as a kind of ‘safety 
blanket’ beneath which technologists may hide if they hubristically argue that ‘because 
I have ticked the boxes my system design is good enough to protect privacy’. Systems 
should be designed in such a way that the potential con?ation of understanding relating 
to privacy, security, and data protection by design (and/or) default is reduced—this may 
be achieved by purposeful disambiguation. This disambiguation may involve acknowl- 
edging that manufacturers cannot guarantee total privacy and explaining the factors 
which underpin that uncertainty (as demonstrated in the privacy Orbits in particular). 
The complexities of non-functional requirements, particularly in IoT contexts, should 
be approached heuristically; users, and every other actor in the given constellation, 
should be given the agency to understand any given situation for themselves. 
Avoid He?alump Traps. Adoption of IoT devices has unequivocal societal and 
economic bene?ts, but to capitalize on those bene?ts designers, engineers and policy-makers 
need to set aside beliefs that are founded on the conceptual possibility of ‘perfect’ 
systems. Such beliefs are incongruous with the unavoidable realities of privacy, trust, 
and security issues. Instead, the IoT needs to be designed with a considered approach 
that accepts IoT devices de?nitely do pose problems for individuals’ privacy, but that 
those problems can be tempered by subtly shifting our design paradigms such that they 
incorporate constellations of meaning and inform all participants in a constellation of 
their roles within it. To reinvent the world, we must speak a new language, and that 
language should ensure that He?alump traps are not part of the vernacular. 
Acknowledgements. This research was supported by the RCUK Cyber Security for the Internet 
of Things Research Hub PETRAS under EPSRC grant EP/N02334X/1. 
References 
1. Auger, J.: Speculative design: crafting the speculation. Dig. Creat. 24(1), 11–35 (2013). 
https://doi.org/10.1080/14626268.2013.767276 
2. Bevan, N.: How you could bene?t from using ISO standards. In: Extended Abstracts of the 
ACM CHI 2015 Conference on Human Factors in Computing Systems, pp. 2503–2504 
(2015). https://doi.org/10.1145/2559206.2567827 
3. Bogost, I.: Alien Phenomenology, or What It’s Like to Be a Thing. University of Minnesota 
Press, Minneapolis (2012) 
The IoT and Unpacking the He?alump’s Trunk 149
4. Bryant, L.R.: Democracy of Objects. Open Humanities Press, London (2011). https://doi.org/ 
10.3998/ohp.9750134.0001.001 
5. Cavoukian, A.: Operationalizing privacy by design. Commun. ACM 55(9), 7 (2012). https:// 
doi.org/10.1145/2330667.2330669 
6. Cavoukian, A., Jonas, J.L.: Privacy by Design in the Age of Big Data (2012) 
7. Coulton, P., Lindley, J., Sturdee, M., Stead, M.: Design ?ction as world building. In: 
Proceedings of the 3rd Biennial Research Through Design Conference (2017). https://doi.org/ 
10.6084/m9.?gshare.4746964 
8. Dunne, A.: Hertzian Tales: Electronic Products, Aesthetic Experience, and Critical Design. 
The MIT Press, London (2006) 
9. Dunne, A., Raby, F.: Speculative Everything. The MIT Press, London (2013) 
10. Frayling, C.: Research in art and design. R. Coll. Art Res Pap. 1(1), 1–9 (1993) 
11. Gasson, S.: Human-centered vs. user-centered approaches to information system design. J. 
Inf. Technol. Theory Appl. 5(2), 29–46 (2003) 
12. Gaver, W.: What should we expect from research through design? In: Proceedings of the 2012 
ACM Annual Conference on Human Factors in Computing Systems - CHI 2012, p. 937 
(2012). https://doi.org/10.1145/2207676.2208538 
13. Giacomin, J.: What is human centred design? Des. J. 17(4), 606–623 (2014). https://doi.org/ 
10.2752/175630614X14056185480186 
14. Von Grafenstein, M., Douka, C.: The “state of the art” of privacy- and security-by-design 
(measures). In: Proceedings of MyData (2017) 
15. Gratton, P., Ennis, P.J.: The Meillassoux Dictionary. Edinburgh University Press, Edinburgh 
(2014) 
16. Jokela, T., Iivari, N., Matero, J., Karukka, M.: The standard of user-centered design and the 
standard de?nition of usability. In: Proceedings of the Latin American Conference on Human-
Computer Interaction - CLIHC 2003, pp. 53–60 (2003). https://doi.org/ 
10.1145/944519.944525 
17. Kling, R., Star, S.L.: Human centered systems in the perspective of organizational and social 
informatics. ACM SIGCAS Comput. Soc. 28(1), 22–29 (1998). https://doi.org/ 
10.1145/277351.277356 
18. Lindley, J., Coulton, P.: On the Internet No Everybody Knows You’re a Whatchamacallit (or 
a Thing). Making Home: Asserting Agency in the Age of IoT Workshop (2017). http:// 
eprints.lancs.ac.uk/84761/1/On_the_Internet_Everybody_Knows_Youre_a_Thing.pdf 
19. Lindley, J., Coulton, P.: Back to the future: 10 years of design ?ction. In: British HCI 2015 
Proceedings of the 2015 British HCI Conference, pp. 210–211 (2015). https://doi.org/ 
10.1145/2783446.2783592 
20. Lindley, J., Coulton, P., Cooper, R.: Why the Internet of Things needs object orientated 
ontology. Des. J. (2017). https://doi.org/10.1080/14606925.2017.1352796 
21. Lindley, J., Coulton, P., Cooper, R.: Informed by design. In: Living in the Internet of Things: 
PETRAS Conference (2018) 
22. Lindley, J., Sharma, D., Potts, R.: Anticipatory ethnography: design ?ction as an input to 
design ethnography. In: Ethnographic Praxis in Industry Conference Proceedings 2014, vol. 
1, pp. 237–253 (2014). https://doi.org/10.1111/1559-8918.01030 
23. Macdonald, N., Reimann, R., Perks, M., Oppenheimer, A.: Beyond human-centered design? 
Interactions (2005). https://doi.org/10.1145/1013115.1013184 
24. Norman, D.A.: HCD Harmful? A Clari?cation - jnd.org. http://www.jnd.org/dn.mss/ 
hcd_harmful_a_clari.html 
25. Norman, D.A.: The Invisible Computer: Why Good Products Can Fail, the Personal Computer 
is So Complex, and Information Appliances are the Solution. The MIT Press, London (1998) 
150 J. Lindley et al.
26. Norman, D.A.: Human-centered design considered harmful. Interactions 12(4), 14 (2005). 
https://doi.org/10.1145/1070960.1070976 
27. Norman, D.A.: The Design of Everyday Things, Revised edn. Basic Books, New York (2013) 
28. Obar, J.A., Oeldorf-Hirsch, A.: The biggest lie on the internet: ignoring the privacy policies 
and terms of service policies of social networking services. In: The 44th Research Conference 
on Communication, Information and Internet Policy (2016). https://doi.org/10.2139/ssrn. 
2757465 
29. Pierce, J., DiSalvo, C.: Dark clouds, Io $ #! +, and? [Crystal Ball Emoji]: projecting network 
anxieties with alternative design metaphors. In: Proceedings of the 2017 Conference on 
Designing Interactive Systems, DIS 2017, pp. 1383–1393 (2017). https://doi.org/ 
10.1145/3064663.3064795 
30. Raby, F., Dunne, A.: A/B (2009). http://www.dunneandraby.co.uk/content/projects/476/0. 
Accessed 27 Oct 2014 
31. Schraefel, M.C., Gomer, R., Alan, A., Gerding, E., Maple, C.: The Internet of Things: 
interaction challenges to meaningful consent at scale. Interactions 24(6), 26–33 (2017). 
https://doi.org/10.1145/3149025 
32. Shapiro, S.S.: Privacy by design. Commun. ACM 53(6), 27 (2010). https://doi.org/ 
10.1145/1743546.1743559 
33. Spiekermann, S.: The challenges of privacy by design. Commun. ACM 55(7), 38 (2012). 
https://doi.org/10.1145/2209249.2209263 
34. Steen, M.: Tensions in human-centred design. CoDesign 7(1), 45–60 (2011). https://doi.org/ 
10.1080/15710882.2011.563314 
35. Sterling, B.: The Epic Struggle of the Internet of Things. Strelka Press, Moscow (2014) 
36. Suchman, L.: Human-Machine Recon?gurations: Plans and Situated Actions. Cambridge 
University Press, Cambridge (2007) 
37. Taylor, P., Allpress, S., Carr, M., Norton, J., Smith, L.: Internet of Things: Realising the 
Potential of a Trusted Smart World (2018). https://www.raeng.org.uk/publications/reports/ 
internet-of-things-realising-the-potential-of-a-tr 
38. Thomas, V., Remy, C., Bates, O.: The limits of HCD. In: Proceedings of the 2017 Workshop 
on Computing Within Limits - LIMITS 2017, pp. 85–92 (2017). https://doi.org/ 
10.1145/3080556.3080561 
39. Tonkinwise, C.: How we intend to future review of Anthony Dunne. Des. Philos. Pap. 12(2), 
169–187 (2014). https://doi.org/10.2752/144871314X14159818597676 
40. Vollmer, N.: Article 25 EU General Data Protection Regulation (EU-GDPR) (2017) http:// 
www.privacy-regulation.eu/en/article-25-data-protection-by-design-and-by-default-
GDPR.htm. Accessed 15 Jan 2018 
41. Wakkary, R., Oogjes, D., Hauser, S., Lin, H., Cao, C., Ma, L., Duel, T.: Morse things: a design 
inquiry into the gap between things and us. In: Proceedings of the 2017 Conference on 
Designing Interactive Systems, pp. 503–514 (2017). https://doi.org/ 
10.1145/3064663.3064734 
42. Summaries of Articles contained in the GDPR. http://www.eugdpr.org/article-summaries.
html. Accessed 15 Sept 2017 
43. ISO 9241-210. Ergonomics of human-system interaction – Part 210: Human-centred design 
for interactive systems. International Organization for Standardization (2015). https:// 
www.iso.org/standard/52075.html 
The IoT and Unpacking the He?alump’s Trunk 151
Toys That Talk to Strangers: A Look at the Privacy 
Policies of Connected Toys 
Wahida Chowdhury(?) 
University of Ottawa, Ottawa, ON, Canada 
Wahida.Chowdhury@hotmail.ca 
Abstract. Toys that are connected to the Internet are able to record data from 
users and share the data with company databases. The security and privacy of 
user data thus depend on companies’ privacy policies. Though there is a rising 
concern about the privacy of children and parents who use these connected toys, 
there is a scarcity of research on how toy companies are responding to the concern. 
We analyzed privacy policies of 15 toy companies to investigate the ways toy 
companies publicly document digital standards of their connected products. Our 
results show that most toy companies are either unclear or do not mention in their 
privacy policy documents how their toys protect the security and privacy of users. 
We recommend measures that toy companies may adopt to explicitly respond to 
security and privacy concerns so parents can make informed decisions before 
purchasing the connected toys for their children. 
Keywords: Connected toys · Smart toys · Internet of Things 
Information privacy · Data security · Privacy policies · Digital standards 
Children · Parents 
1 Introduction 
Toys that gather information from owners via microphone, camera or user inputs, and 
share the information via Internet to whomever these toys are connected to, are known 
as connected toys. These toys may replace traditional friends by being highly interactive 
such as by recording the child’s preferences and by talking back to the child. These toys 
may also replace traditional baby sitters and keep the child busy when parents are 
working. Toy companies quickly noted these bene?ts and advertised their connected 
products to children and parents by obscuring associated risks to privacy and data 
security. For example, Edwin the Duck uses Bluetooth technology to broadcast lullabies 
to its young users; however, the toy company also collects and retains everything the 
child says and shares that information with “trusted” third parties. The purpose of our 
research was to investigate the extent to which connected toy companies respond to 
bene?ts versus threats towards consumers’ privacy and data security. 
We analyzed the privacy policies of 15 connected toys; the connected products were 
selected from the privacy guide developed by Mozilla foundation, a not-for-pro?t 
organization that supports and promotes the use of connected products. We asked 16 
questions about the privacy and data security of each product and looked through the 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 152–158, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_12
manufacturers’ privacy policies for answers. The results provide a snapshot of the infor- 
mational practices of the connected toy companies, and recommend ways to make 
privacy policies more explicit so consumers can make informed decisions before 
purchasing. 
2 Literature Review 
Connected toys relate to ‘a future in which digital and physical entities can be linked, 
by means of appropriate information and communication technologies, to enable a whole 
new class of applications and services’ [1]. A wide variety of toys fall under the domain 
of connected toys. Some of these toys are connected to voice and/or image recognition 
software (e.g. Hello Barbie™ or the Hatchimals); some are connected to app-enabled 
robots, and other mechanical toys (e.g. Dash and Dot); and others are connected to video 
games (e.g. Skylanders or Lego Dimensions) [2]. Some connected toys are connected 
to the Internet but do not simulate human-like behaviour; some toys simulate human 
interaction by talking to users; and other toys such as connected robots can be coded by 
users to perform novel activities [3]. 
Mascheroni & Holloway (Eds.) (2017) Identi?ed articles about connected toys from 
12 countries (Australia, Austria, Finland, Germany, Italy, Lithuania, Malta, Portugal, 
Romania, Serbia, Slovenia and Spain), and documented the bene?ts of connected toys 
as reported by parents. The bene?ts included the development of digital literacy, crea- 
tivity, motivation to learn, reading and writing literacy, social skills, physical activity, 
etc. Despite the bene?ts however, concerns about the security and privacy of users (who 
are primarily children) are documented in the literature from the hay days of connected 
toys [4]. 
Concerns about children’s security and privacy were already in place as social 
networking, gaming, and other websites gathered, stored, and shared data from child 
users with other third parties often without the child users’ knowledge or consent [5]. 
Connected toys intensi?ed the concerns by making data collection from children easier 
(such as by microphone, camera, location tracker, and movement detectors) and by being 
able to collect more personal data (such as by being able to follow child users everywhere 
and by being always “on”). The developments exacerbated the risks of easy access to 
personal information, simply by hacking company databases. Recent examples include 
hacking of data collected by the connected toys, Hello Barbie and VTech, from millions 
of child users [2]. 
The security and privacy concerns imply that toy makers should incorporate e?ective 
measures from inception to completion of the development process of connected toys 
[6]. Our research looks into the privacy policies of toy companies to report how the 
companies are addressing public hopes and fears surrounding connected toys. 
3 Methodology 
The Mozilla foundation published a report, Privacy Not Included, in December 2017 
that reviewed openly accessible privacy policies of di?erent connected products. The 
Toys That Talk to Strangers: A Look at the Privacy Policies 153
report aimed to draw buyers’ attention to three questions related to privacy and security 
before purchasing the products: (1) How do the products spy on users? (2) What infor- 
mation about the users do the products collect? and (3) What could happen to users if 
data breeches occur? For example, Mozilla guide reports that the connected toy, Dash 
the Robot, is a one-eyed robot that can sing, dance, and play to give an highly interactive 
and fun experience to children; however, parents should be warned that the robot can 
spy on children via microphone and that parents have no control over the data that the 
robot collects. 
To extend the Mozilla product reviews and have more in-depth synopsis of users’ 
privacy and data security related to connected products, we conducted further analyses 
of the privacy policies of 15 toys and game consoles listed in the Mozilla report. These 
connected products were: Smart letters, Edwin the Duck, Adidas miCoach Smart Soccer 
Ball, Ozobot Evo, Beasts of Balance, Toymail Talkie, Sphero SPRK+, Osmo, Dash the 
robot, BB-8 by Sphero, Airjamz Air Guitar, Hello Barbie, Microsoft Xbox One, Sony 
Playstation 4, and Nintendo Switch. 
We developed 16 distinct questions from the open access Digital Standards, created 
by Consumer Reports, Disconnect, Ranking Rights and the Cyber Independent Testing 
Lab to evaluate the privacy and security of the 15 connected toys. For example, we 
investigated how secure user information is when using a connected product; we looked 
through the product’s privacy policies to determine if the company routinely audits user 
data and restricts third party access to the data. The various questions answered what 
privacy measures were put in place, what privacy controls were available, and what kind 
of information the companies gathered from users and disclosed to third parties. 
4 Results 
4.1 How secure is users’ data? 
Almost all the companies we studied claimed that they take steps or comply with stand- 
ards to protect user data, but they are not always clear about what steps they take or what 
standards they follow. Furthermore, none of the companies we studied are con?dent that 
they are hack-proof, and admit that security breaches can still happen. 
4.2 Do users need to make a password? 
Most companies require users to make a password. However, passwords are not required 
to be complex/secure. This means that the user information could be easily hacked. 
4.3 Does the company encrypt users’ information? 
Only four (27%) of the companies we studied fully encrypt user data; others partly 
encrypt users data or do not encrypt at all. This means that the user information could 
be easily understood if hacked. 
154 W. Chowdhury
4.4 Can users control the data that the company collects? 
Almost half the companies we studied (53%) do not mention if users can control their 
own data. In fact, few companies such as “osmo” toy automatically collect information 
without user control. 
4.5 Can users delete their data when they leave the service? 
Almost all the companies we studied allow users to delete data when they leave services, 
but maybe not completely. For example, companies may retain non-personally identi- 
?able data, and catched or backup copies of user data that companies are not explicit 
about. This means that even if users leave a service, their information could be hacked. 
4.6 Do users know what information the company collects? 
Almost all the companies we studied give users snapshots of what information is 
collected from them. However, the hidden rules are often too complex to understand 
and are easy to overlook. 
4.7 Does the company collect only the information needed for the product to 
function? 
Almost all the companies we studied collect more information from users than what is 
needed to make their product work. 
4.8 Is users’ privacy protected from third parties by default? 
None of the companies we studied protect user data from third companies by default. 
Some companies allow users to review and change their privacy settings. However, it 
is not clear to what extent users are able to protect their privacy without loosing access 
to services. 
4.9 How does the company use users’ data? 
The privacy documents of almost all the companies we studied explicitly state how they 
might use user data. However, most companies leave the responsibility on users to 
control their own privacy, and users are threatened that they might not get the best service 
if they restrict access to their data. 
4.10 Does the company have a privacy policy document? 
All the companies we studied have privacy policy documents. However, the documents 
are often very long in a tangible language, and often so not answer important questions. 
Toys That Talk to Strangers: A Look at the Privacy Policies 155
4.11 Will users receive a noti?cation if the company changes its privacy policy? 
Less than half (40%) of the companies we studied send noti?cations if their privacy 
policies change. Most companies either do not mention of any change or simply update 
the date on top of their policy documents that are very unlikely to be read twice by users 
to notice the change. 
4.12 Does the company comply only with legal and ethical third-party requests 
for users’ information? 
Only 27% of the companies we studied explicitly mentioned that they comply only with 
legal and ethical third-party requests of user information. Most companies claim to share 
non-identi?able information or are not explicit about how information requests are 
handled. 
4.13 Does the company require users to verify identity with government-issued 
identi?cation, or with other forms of identi?cation that could be connected 
to users’ o?ine identity? 
None of the companies we studied require users to verify identity with government-issued 
identi?cation, indicating that users can register for services under false names. 
4.14 Does the company notify users for any unauthorized access to data? 
Only two (13%) of the companies we studied noti?ed users of security breaches. This 
means that users may continue to use connected products even after these are hacked. 
4.15 Is the company transparent about its practices for sharing users’ data with 
the government and third parties? 
Only four (27%) of the companies we studied were transparent about sharing practices 
with the government and third parties. 
4.16 Does the company send noti?cations if the government or third parties 
request access to users’ data? 
Only three (2%) of the companies we studied noti?ed users of third party requests. This 
means that third parties may collect users’ information without their awareness. 
5 Discussion 
Childhood experiences are rapidly becoming digital by including connected toys and 
games that let children connect to strangers e?ortlessly from the comfort of their home. 
Although this may seem fun and safe, our ?ndings indicate that none of the toys provided 
156 W. Chowdhury
satisfactory answers to all 16 questions related to privacy and data security. There 
remained a variety of di?erent ways a connected toy company may gather information, 
such as recording users preferences, tracking a user’s IP address and turning on a devi- 
ce’s camera every time the toy is used. The security of user information thus relies on 
the security of the databases of a connected toy company or of the third parties that the 
company shares information with. If hackers or even employees access the databases 
with any wrong motive from having fun to stealing money to initiating a cyber-war, 
strangers can talk back to the young users and make them do inappropriate things. 
To prevent data breeches, privacy policy documents of the 15 toy companies that 
we analyzed claimed to have privacy measures in place; this might make parents feel 
relieved to trust the companies to be responsible care takers of their children. However, 
the privacy policies of almost all the companies accepted that their databases might not 
be secure enough to prevent data breeches. Companies seem to posit that users are 
responsible for their own security. However, users were often threatened of losing serv- 
ices if they exercised control of their privacy, for example if users did not share data 
with third parties. 
The privacy policies of each company attempt to document their data collection and 
sharing practices that might give the feeling of making an informed decision about 
purchasing the company products. However, the policies do not follow a standardized 
format and are not always written in a way that the general user could understand. Also 
the de?nitions of privacy measures such as data control and data collection are not 
standardized between companies. This means that many parents may not be aware of 
the information that companies gather about their children which may limit their ability 
to make fully informed decisions about the products that they’re purchasing. For 
example, when a parent signs up for an account for various toys or consoles, certain 
information is asked of them but the sign up mechanisms do not draw the parent’s 
attention to the fact that the toy’s microphone may be accessed or that the child’s IP 
address and/or Wi-Fi information may be stored in the company servers. 
Furthermore, users may ignore reading lengthy documents, such as ambiguous 
privacy policies, that describe before purchasing what a certain connected toy does. For 
example, users may ignore ambiguous warning that a toy maybe harmful which does 
not state clearly why or how the toy may be harmful. Users may also feel if a product 
is in the market, the company must have done security checks. For example, if a new 
car is in the market, users should not have to think if the car would be safe for driving; 
let alone, investigating if children’s toys are safe for playing. 
6 Recommendations for Toy Companies 
Our ?ndings suggest that a Frequently Asked Questions or FAQ should accompany 
privacy policy documents that itemize privacy-related questions the way we did in this 
report so it’s easier for people to see how their information is collected, used and 
disclosed. Secondly, if the concerns stem from sharing data with company databases, 
toy companies should re-consider the necessities of sharing data with remote databases 
Toys That Talk to Strangers: A Look at the Privacy Policies 157
that have the possibility of being hacked, rather than sharing data locally within the toy 
itself that can only be hacked if the child loses the toy. 
Furthermore, more evaluations need to be done, as new toys are developed to ensure 
that children’s information is given the highest level of protection. Manufacturers should 
strive to make connected toys more reliable and capable each year while service 
providers, software engineers, governments, private organizations, and technical experts 
should strive to prevent and solve security and socio-economic problems arising from 
connected toys. 
Acknowledgment. The author wishes to thank Diana Cave (Criminology Department, 
University of Ottawa) for assisting in conducting the research, and professor Valerie Steeves 
(Criminology Department, University of Ottawa) for her valuable comments on previous drafts 
of this article. 
References 
1. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of Things: vision, applications 
and research challenges. Ad Hoc Netw. 10(7), 1497–1516 (2012). https://doi.org/10.1016/ 
j.adhoc.2012.02.016 
2. Holloway, D., Green, L.: The internet of toys. Commun. Res. Pract. 2(4), 506–519 (2016) 
3. Mascheroni, G., Holloway, D. (eds.): The Internet of Toys: A Report on Media and Social 
Discourses Around Young Children and IoToys. DigiLitEY, London (2017) 
4. Dobbins, D.L.: Analysis of security concerns and privacy risks of children’s smart toys. Ph.D. 
Dissertation. Washington University St. Louis, St. Louis, MO, USA (2015) 
5. Steeves, V., Jones, O.: Surveillance, children and childhood (Editorial). Surveill. Soc. 7(3/4), 
187–191 (2010) 
6. Nelson, B.: Children’s Connected Toys: Data Security and Privacy Concerns. United States 
Congress Senate Committee on Commerce, Science, and Transportation, 14 December 2016. 
https://www.hsdl.org/?view&did=797394. Accessed 4 July 2017 
158 W. Chowdhury
A Reinforcement Learning Multiagent Architecture 
Prototype for Smart Homes (IoT) 
Mario Rivas(?) 
and Fernando Giorno 
Instituto de Pesquisas Tecnológicas – IPT, São Paulo, Brazil 
mariorivas@hotmail.com, fgiorno@gmail.com 
Abstract. Continuous technology progress is fueling the delivery of new and 
less expensive IoT components, providing a variety of options for the Smart 
Home. Although most of the components can be easily integrated, achieving an 
optimal con?guration that prioritizes environmental goals over individual 
performance strategies is a complex task that requires manual ?ne tuning. The 
objective of this work is to propose an architecture model that integrates rein- 
forcement learning capabilities in a Smart Home environment. In order to ensure 
the completeness of the solution, a set of architecture requirements was elicited. 
The proposed architecture is extended from the IoT Architecture Reference 
Model (ARM), with speci?c components designed to coordinate the learning 
e?ort, as well as data governance and general orchestration. Besides con?rming 
the ful?llment of the architecture requirements, a simulation tool was developed 
to test the learning capabilities of a system instantiated from the proposed archi- 
tecture. After six million and four hundred thousand execution cycles, it was 
veri?ed that system was able to learn in most of the con?gurations. Unexpectedly, 
results show very similar performance for collaborative and competitive envi- 
ronments, suggesting that a more varied selection of agent scenarios should be 
tested as an extension of this work, to con?rm or contest Q-Learning hypothesis. 
Keywords: IoT · Reinforcement · Learning · Q-Learning · Architecture 
1 Introduction 
Considering the continuous progress on the scienti?c landscape that facilitates the 
delivery of new IoT (Internet of Things) components, and the absence of a single fully 
adopted industry standard [1], the goal of achieving an optimal e?ciency setup for a 
Smart Home relies on empirical approaches and context-based rules, rather than AI 
techniques. Furthermore, strategies to achieve context speci?c goals like energy e?- 
ciency, home safety or environmental control requires a pre-emptive knowledge of the 
components and their interaction, reducing the ?exibility and resilience. 
The objective of this work is to propose an abstract architecture model that integrates 
reinforcement learning capabilities in a Smart Home environment, allowing real-time 
agent con?guration and information exchange governance. By this mean, concrete 
systems derived from this architecture will be able to learn optimal strategies to achieve 
environmental goals. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 159–170, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_13
The rest of this paper is structured as follows. Section 2 introduces related researches 
that contributed to the background of this work. Proposed architecture is presented in 
Sect. 3, describing the architecture requirements, the design approach and the ?nal 
architecture description. Section 4 details the testing process, introducing the design of 
the testing tool, the simulation cases and its results. Finally, conclusions and future scope 
are included in Sect. 5. 
2 Related Work and Research Contribution 
2.1 Related Work 
Several approaches were been proposed to resolve the complex interaction issues of the 
IoT environments, and its dynamic con?guration requirements. In the speci?c ?eld of 
manufacturing, Katasonov et al. [2] introduced the concept of multiagent platforms with 
autonomous behaviour to overcome the interoperability issues derived from multiplicity 
of standards and protocols. Wang et al. [3] proposed an agent-based hybrid service 
delivery, composed by four subsystems: (1) hybrid services based on agents, (2) hybrid 
service of ontological search engine, (3) service enablers repository and (4) a service-oriented 
agent lifecycle manager. 
In order to reduce the uncertainty of the inherent stochastic IoT environment, Nastic 
et al. [4] introduced the platform U-GovOps to manage elastic IoT systems, applying a 
declarative proprietary language to de?ne policies and resolve real-time issues. 
While most authors de?ned solutions based in multiagent systems, few agent 
learning techniques references were found [3], as well as no speci?c mention to rein- 
forcement learning. 
2.2 Research Contribution 
This paper presents an integrated vision of several recent studies related to Smart Home 
architecture powered by multiagent systems and reinforcement learning techniques, 
de?ning a framework to instantiate concrete architectures. 
In order to verify the learning capacity of the resulting architecture, a simulation tool 
was created where 64 di?erent scenarios were tested. Results of these simulations are 
relevant to understand the impact of the hyperparameters in the reinforcement learning 
approach. 
3 Proposed Architecture 
The current section describes the proposed architecture model to cover the objectives 
explained at Sect. 1. Initially a summary of the architecture requirements is listed, then 
an explanation of the design approach and ?nally the architecture description itself. 
160 M. Rivas and F. Giorno
3.1 Architecture Requirements 
The general objective of this architecture is to address learning capabilities on a Smart 
Home architecture based in multiagent system, supporting online recon?guration and 
resilience. The list of architecture requirements, classi?ed in functional and non-func- 
tional requirements is presented in Table 1. 
Table 1. Architecture requirements 
Req. type Architecture requirements 
Element Description 
Functional Initial system con?guration Architecture provides suitable artefacts for 
system con?guration 
Learning process oversight Individual agent learning progress is calculated 
and utilized on the reward provision 
New agent inclusion New agents are included in real-time 
Agent removal Architecture provides artefacts to remove 
agents in real-time 
System parameters 
modi?cation 
System parameters are modi?able at real-time 
Information consumers 
coordination 
External information consumers can be added/ 
removed in real-time 
External information ?ow System information ?ows externally to the 
consumers, according to the information 
governance in place 
System governance control Architecture provides artefacts to de?ne and 
manage governance 
Non-
Functional 
Resilience Architecture provides a redundant structure to 
support operations continuity 
Scalabilty System resource requests are anticipated and 
capacity limitations are proactively managed 
Performance Component orchestration is aligned with 
system performance 
3.2 Design Approach 
The architecture reference model for IoT (ARM) was developed in a joint e?ort by the 
European Platform on Smart Systems (EPoSS) and the IOT-A project. Its main function 
is to provide a common structure and a set of guidance to elaborate concrete IoT archi- 
tectures in di?erent contexts. ARM consists in a set of interdependent sub-models 
describing reference architecture basic aspects. The intersection of this model with the 
system requirements determines the instantiated architecture, represented by views and 
perspectives. Basic models described by ARM are: IoT Domain (physical entities and 
their logical representation, etc.), Information Domain (information structures, service 
modelling, etc.), Functional Domain (group of functionalities included), Communica- 
tion Domain and Trust, Security and Privacy Domain. 
A Reinforcement Learning Multiagent Architecture Prototype 161
Bassi et al. [5] approach to generate architectures based on ARM is supported by the 
usage of views and perspectives as described by Rozanski and Woods [6]. The set of 
basic views suggested by these authors [6] are: Functional, Information, Concurrent, 
Development, Deployment and Operational. This collection of views provides a 
comprehensive description of the architecture; however it does not explicitly consider 
non-functional requirements like information security or resilience. Since this type of 
requirements are orthogonal to the functional requirements, Rozanski and Woods [6] 
suggest to document them as “perspectives”, describing their intersection with func- 
tional views as a complement of the main description. Following the recommendation 
of the authors, this work considered the following perspectives: Information Security, 
Performance and Scalability, Availability and Resilience and Evolution. A graphic view 
of the ARM components and their interaction is represented in Fig. 1. 
Fig. 1. Architecture reference description model after ARM. 
3.3 Architecture Description 
The main concept of the proposed architecture is based on the virtualization of the agents 
and their asynchronous learning management. It is composed by the following elements: 
Physical Context, Virtual Agent Farm (VAF), Asynchronous Data Layer (ADL), Data 
Exchange Manager (DEM), Context Manager and Learning Manager, displayed below 
in Fig. 2. 
162 M. Rivas and F. Giorno
Fig. 2. Proposed architecture main components. 
Physical context is the representation of the elements that compound the environ- 
ment, like sensors, actuators and other hardware devices. Depending on the number of 
sensors and actuators, several non-exclusive combinations may be de?ned to instantiate 
correspondent agents. 
The VAF is the logical component that stores virtual agents and their system process. 
ADL intermediates data tra?c among di?erent components, assuring the persistence 
and resilience of the information. Data exchange with external/internal consumer/ 
publishers is managed by the DEM, based on the information governance de?ned in the 
con?guration and administrated by the Context Manager. This component is on charge 
of the system orchestration, initiating and controlling all the process and resources. 
Learning Manager calculates and distributes rewards to the agents and oversights 
the system learning process. While a full representation of the views and perspectives 
of the proposed architecture exceeds the scope of this paper, functional view, informa- 
tion view and context view are brie?y described below in Fig. 3. 
A Reinforcement Learning Multiagent Architecture Prototype 163
Fig. 3. Functional view. 
The functional view describes four main logical components and their basic inter- 
actions. As depicted here, Context Manager orchestrates and supervises most of the 
functional ?ows within the system. Although ADL is embedded on the background of 
this visual representation, none of its functionalities justify its inclusion as a logical 
component. 
164 M. Rivas and F. Giorno
Entities included in the Information View diagram represent main information 
concepts and their composition/aggregation relationship. Information ?ow diagram 
(usually described using an UML message ?ow diagram) complements this view, repre- 
senting the information system lifecycle. Most important entities of the proposed archi- 
tecture are described in Fig. 4. 
Fig. 4. Information view. 
As de?ned by Rozanski and Woods [6], the Context View describes the relationships, 
dependencies and interactions between the system and its environment. The proposed 
architecture is de?ned by the logical and physical context. Data and software compo- 
nents are included on the logical context, while hardware components are included in 
the physical context. External entities interacting with the system are represented as out-of-
system-boundary in Fig. 5. 
A Reinforcement Learning Multiagent Architecture Prototype 165
Fig. 5. Context view. 
4 Testing 
The architecture proposed was designed to cover all the architecture requirements 
mentioned in Sect. 3, however its material veri?cation cannot be performed without a 
concrete system derived from it. While creating an IoT concrete system to assess the 
feasibility of the proposed architecture is out of the scope of this work, an execution 
simulator tool (EST) was designed to con?rm the learning capabilities of the solution. 
4.1 EST Design 
The EST was developed as a functional prototype of the proposed architecture, consid- 
ering its main structures and the relationship among components. Due to the experi- 
mental approach and the limited resources, some particularities were de?ned: 
– Physical environment was reduced to a bi-dimensional space; 
– Every agent has an individual id, a pair of coordinates (x,y) and a reference to a two 
other agents, known as the “vertices”; 
166 M. Rivas and F. Giorno
– There are nine (9) possible actions to be taken for an agent at any cycle: stand still 
or move in one of eight (8) possible directions (0°, 45°, 90°, 135°, 180°, 225°, 270° 
or 315°); 
– At every cycle each agent knows its current coordinates and the coordinates of each 
one of its vertices; 
– The individual reward calculation is based on the angular di?erence from the triangle 
formed by the agent itself and its vertices, and a hypothetical equilateral triangle. To 
calculate the di?erence, every internal angle of the triangle is compared with a target 
of 60°, computing the sum of these three di?erences and subtracting from 120: 
R 
= 120- 
( 
| 
| ang1-60 | 
| 
+ 
| 
| 
ang2-60 | 
| 
+ 
| 
| 
ang3-60 | 
| 
) 
4.2 Software Project 
The prototype was developed in Linux Ubuntu 16, using Python 3.6 language and the 
machine learning library PyTorch 0.2.0. Its neural network was built using a deep queue 
learning approach, with ?ve (5) input parameters, a hidden layer of thirty (30) neurons 
and nine (9) output parameters (one per each possible agent action). The loss optimiza- 
tion function applied was the adaptive moment estimation (ADAM), and a neuron acti- 
vation was linear recti?cation (ReLU). 
4.3 Simulation Cases 
In order to de?ne the simulation cases, the following parameters were considered: (1) 
number of agents, (2) algorithm learning rate (alpha), (3) softmax policy temperature 
(tau) and (4) number of execution cycles. 
The number of agents was limited to four, using the following con?gurations: (a) 
three agents with only one active agent, (b) three agents with only two active agents, (c) 
three agents all active and (d) four agents all active. 
Learning rate is a parameter utilized by the Q-Learning algorithm [7] to de?ne the 
prevalence of the new knowledge over the previous one. Alpha large values (closer to 
1- ) implies a faster substitution of knowledge while lower values (closer to 0+ ) implies 
a more conservative approach. De?ned values for the simulation cases were {0.2, 0.5, 
0.8}. 
The decision policy utilized by the tool is a version of Softmax [8] implemented on 
the PyTorch library. This policy aims to de?ne whether to choose the greedy action (the 
best-known action for a speci?c state) or the random action (to explore the environment) 
based on the amount of knowledge currently harvested, i.e. the more knowledge the 
more likely to choose a greedy action. The “temperature” parameter (tau) provides a 
magnitude to the policy. Values chosen to create the test cases were {0.01, 0.1, 1, 10, 
100}. 
The number of execution cycles was determined after some exploratory cases, 
aiming to gather enough executions to support the conclusions, within the expected 
timeframe. As result, the target number of execution cycles was ten thousand (10,000). 
A Reinforcement Learning Multiagent Architecture Prototype 167
For each agent con?guration, ?fteen (15) parameter combinations were de?ned 
(three learning rate values x ?ve policy temperatures) plus a scenario without learning 
capabilities, totalizing sixty-four (64) combinations. Each scenario was executed ten 
(10) times, through ten-thousand (10,000) cycles, completing six-million four-hundred-thousand 
cycles. 
4.4 Results 
Every execution generated a text ?le containing the reward of the system (calculated as 
the sum of the individual rewards) for each cycle and its correspondent graph. In order 
to evaluate the convergence of the learning curve, reward values were segmented in 
?fty (50) stages, and standard deviation was calculated each one. Whenever the standard 
deviation remains decreasing or stable at a very low value, learning curve convergence 
is con?rmed. 
In general, all test cases con?rmed the convergence of the learning curves, except 
for a few cases where policy temperature was very low and (as expected) scenarios with 
no learning capabilities. Figure 6 describes the results of the simulations consolidated 
by policy temperature. 
Fig. 6. Learning curve convergence by policy temperature. 
When compared scenarios by learning rate, no relevant di?erence was found, as 
shown in Fig. 7. 
168 M. Rivas and F. Giorno
Fig. 7. Learning curve convergence by learning rate. 
5 Conclusions 
The objective of this work was to de?ne an architecture of reference that provides 
learning capabilities to a Smart Home environment, allowing for real-time component 
con?guration and external information governance, as described on the architecture 
requirements section. 
The proposed architecture de?nes components and functionalities covering the 
architecture requirements, introducing reinforcement learning features. Simulated 
scenarios executed also con?rmed the learning curve convergence of the system, under 
several di?erent con?gurations. 
According to the Q-Learning algorithm de?nition [7], collaborative multiagent 
systems should converge to an optimal policy in a ?nite number of cycles, however this 
is not guaranteed for competitive environments. Unexpectedly, test results shown a very 
similar convergence curve for collaborative and competitive environments, suggesting 
that a more variated selection of agent scenarios should be tested as an extension of this 
work, to con?rm or contest Q-Learning hypothesis. 
Future extensions of this work may cover the study of learning convergence curves 
for more variated con?gurations, eventually approaching to real life smart home setups. 
Another study path suggested by the results of this work refers to the possibility of 
sharing intelligence among di?erent con?gurations, by persisting the agent neural 
networks. 
Figure 8 represents a consolidated view of the simulations executed by agent con?g- 
uration. 
A Reinforcement Learning Multiagent Architecture Prototype 169
Fig. 8. Learning curve convergence by agent con?guration. 
References 
1. Madakam, S., Ramaswamy, R., Tripathi, S.: Internet of Things (IoT): a literature review. J. 
Comput. Commun. 3, 164–173 (2015) 
2. Katasonov, A., Kaykova, O., Khriyenko, O., Nikitin, S., Terziyan, S.: Smart semantic 
middleware for the Internet of Things. In: Proceedings of the 5th International Conference on 
Informatics in Control, Automation and Robotics, Portugal, pp. 169–178 (2008) 
3. Wang, J., Zhu, Q., Ma, Y.: An agent-based hybrid service delivery for coordinating internet 
of things and 3rd party service providers. J. Netw. Comput. Appl. 36, 1684–1695 (2013) 
4. Nastic, S., Copil, G., Truong. H., Dustdar. S.: Governing elastic IoT cloud systems under 
uncertainty. In: 2015 IEEE 7th International Conference on Cloud Computing Technology and 
Science, pp. 131–138. IEEE, Canada (2015) 
5. Bassi, A., Bauer, M., Fiedler, M., Kramp, T., Van Kranenburg, R., Lange, S., Meissner, S.: 
Enabling Things to Talk: Designing IoT solutions with the IoT Architectural Reference Model, 
p. 349. Springer, Berlin (2013) 
6. Rozanski, N., Woods, E.: Software Systems Architecture. Working with Stakeholders Using 
Viewpoints and Perspectives, p. 529. Pearson, London (2005) 
7. Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, UK (1989) 
8. Tuyls, K., Weiss, G.: Multiagent learning: basics, challenges and prospects. AI Mag. 3, 41–52 
(2012) 
170 M. Rivas and F. Giorno
Real-Time Air Pollution Monitoring 
Systems Using Wireless Sensor Networks 
Connected in a Cloud-Computing, 
Wrapped up Web Services 
Byron Guanochanga1 , Rolando Cachipuendo1 , Walter Fuertes1(B) , 
Santiago Salvador1 , Diego S. Ben´itez2 , Theo?los Toulkeridis1 , Jenny Torres3 , 
C´esar Villac´is1 , Freddy Tapia1 , and Fausto Meneses1 
1 
Universidad de las Fuerzas Armadas ESPE, 171-5-231B Sangolqu´i, Ecuador 
{beguanochanga,recachipuendo,wmfuertes,mssalvador,ttoulkeridis, 
cjvillacis,fmtapia,fhmeneses}@espe.edu.ec 
2 
Universidad San Francisco de Quito USFQ, Campus Cumbay´a, Casilla Postal, 
17-1200-841 Quito, Ecuador 
dbenitez@usfq.edu.ec 
3 
Escuela Polit´ecnica Nacional, P.O. Box 17-01-2759, Quito, Ecuador 
jenny.torres@epn.edu.ec 
Abstract. Air pollution continues to grow at an alarming rate, decreas-ing 
the quality of life around the world. As part of preventive mea-sures, 
this paper presents the design and implementation of a secure 
and low-cost real-time air pollution monitoring system. In such sense, a 
three-layer architecture system was implemented. The ?rst layer contains 
sensors connected to an Arduino platform towards the data processing 
node (Raspberry’s Pi), which through a wireless network sends messages, 
using the Message Queuing Telemetry Transport (MQTT) protocol. As 
a failback method, strings are stored within the data processing nodes 
within ?at ?les, and sent via SSH File Transfer Protocol (SFTP) as a 
restore operation in case the MQTT message protocol fails. The appli-cation 
layer consists of a server published in the cloud infrastructure 
having an MQTT Broker service, which performs the gateway functions 
of the messages sent from the sensor layer. Information is then published 
within a control panel using the NODE-RED service, which allowed to 
draw communication ?ows and the use of the received information and 
its posterior storage in a No SQL database named “MongoDB”. Fur-thermore, 
a RESTFUL WEB service was shared in order to transmit 
the information for a posterior analysis. The client layer can be accessed 
from a Web browser, a PC or smartphone. The results demonstrate that 
the proposed message architecture is able to translate JSON strings sent 
by the Arduino-based sensor Nodes and the Raspberry Pi gateway node, 
information about several types of air contaminants have been e?ectively 
visualized using web services. 
Keywords: Air pollution 
· IoT·
IaaS 
·
WSN 
·
Web services 
.e
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 171–184, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_14
172 B. Guanochanga et al. 
1 Introduction 
The World Health Organization (WHO) [1] reported that “Air pollution is the 
biggest environmental risk to health, carrying responsibility for about one in 
every nine deaths annually”. Although industry and the scienti?c community 
have developed various solutions based on conventional Wireless Sensor Net-works 
(WSN) for air pollution monitoring, the existing products and the gener-ated 
results lack to represent low-cost solutions, some require hiring hosting or 
web services, as well as having a number of limited messages without a failback 
method. 
The aim of this work is to develop a secure environmental monitoring sys-tem 
based on WSN that are integrated to the Internet of Things (IoT) concept, 
increasing the capacity and life span of the sensor nodes of the WSN with rel-ative 
low-costs. Therefore, ?rst, a hardware and software prototype has been 
assembled using Arduino and Raspberry Pi platforms, comprising several air 
pollution sensors as well as newly designed and constructed wireless expansion 
modules. Second, a three-layer architecture, which leverages a real-time air pol-lution 
monitoring system has been designed and implemented: (1) The ?rst 
sensor layer includes the electronic hardware circuits and the software compo-nents, 
both for the Arduino-based sensor nodes and the gateway node, which 
was assembled using a Raspberry Pi together with a low-cost wireless expansion 
module for capturing the data. (2) The application layer, where a Web service 
has been designed and implemented using a set of protocols and formats that 
are used to process the data and store them in a MongoDB Database as part of 
the Cloud infrastructure. (3) The client layer, which consists of a Web graphical 
user interface, providing a visual information about environmental parameters 
in order to allow the communication with the WSN and users. 
The main contributions of this paper include: (1) The creation of a low-cost 
wireless monitoring system (i.e., software) as an IoT application to visualize the 
levels of air pollution. (2) The implementation of a novel three-layer message 
architecture to translate JSON strings sent by Arduino-based sensor Nodes and 
the Raspberry Pi gateway node, which are e?ectively visualized in Web services. 
(3) A failback method as a process for restoring operations via SFTP protocol, 
in case the MQTT message protocol fails. 
The remainder of this paper is organized as follows: Sect. 2 discusses related 
work, Sect. 3 presents the experimental setup, as well as the implementation 
of electronic devices and web services, while Sect. 4 provides the experimental 
results; ?nally, Sect. 5 ends the paper with the conclusion and future work. 
2 Related Work 
The scienti?c community has been developing innovative alternatives to mea-sure 
air pollution using WSN. Nevertheless, several studies has been designed 
conventionally. 
In relation to low-power wireless communication protocols, similar to this 
work, some authors such as [2–14] have used ZigBee technology (based on
Real-Time Air Pollution Monitoring Systems Using WSN 173 
the IEEE 802.15.4). Conversely, in this work the NRF24L01 radio frequency 
transceiver module, [15] which has an advanced energy management, was used. 
The NRF24L01 has an enhanced Shock- Burst hardware protocol accelerator, 
which helps to implement a robust and advanced wireless network with low-cost 
micro-controllers. 
In relation to the connection platform for the di?erent nodes, the study pro-posed 
by [7] used Octopus II. The sensor node implemented had a humidity 
sensor, temperature and a CO sensor. In [11], the same device was used, with 
the di?erence that the 501A Dust sensor module (DSM501A) was added, which 
was designed to detect particles larger than 1 µm. In [5,16,17] the Waspmote 
platform was applied, which is characterized by the use of lower energy consump-tion. 
In [6], nodes were prepared to monitor gases such as carbon monoxide (CO), 
nitrogen dioxide (NO2), sulfur dioxide (SO2), ozone (O3), metals such as lead 
(Pb) and particulate matter. In [16] authors proposed a clustering protocol for 
the sensor network. 
For the connection of di?erent sensors, di?erent models of the Arduino plat-form 
have been used. For instance, in [12] the Arduino Mega 128 microcontroller 
was used together with the MQ-7 sensitive gas sensor detector in order to deter-mine 
CO. For the implementation of the sensor node in [18], the Arduino one 
with the Digi XBee module were used for the wireless mesh communication of 
the nodes. Similarly, in [19] authors used the Arduino R3 board that has an 
Atmel Atmega328 microcontroller with a clock speed of 16 MHz, together with 
a XBee model. Raspberry Pi model B was also used for the base station, where 
a database has been available for the storage of the received readings and a 
Web application was used for data presentation. The majority of these studies 
[12,19–34,36] resemble this work since the same open-source Arduino platform 
is used. However, they di?er in the way the data is transmitted towards the 
database, since a Raspberry Pi acting as the Gateway node is used in this work, 
using a three-layer message architecture, together with the NRF24L01 module 
for Wireless communication. 
Regarding the number of sensors for measuring air quality parameters, in [32] 
a device was implemented to monitor the CO in di?erent industrial plants. In 
[35] temperature and relative humidity data were collected using the SHT11 and 
SHT75 sensors, respectively. In [36], a predesigned sensor node, called CanarIT 
was used, which displayed several sensors. Data from each sensor node were 
stored in the cloud by GPRS communication. In [37], the sensors used were 
MG-811 for CO2, MQ-7 for CO and GP2Y1010AU0F for powder particles. In 
comparison with this study, most sensor nodes determined only up to four pol-lutants, 
including the most common being CO, CO2 and particulate matter. 
Nonetheless, more sensors were implemented in this study in order to mea-sure 
more pollutants, including CO, CO2, methane (CH4), sulfur dioxide (SO2), 
hydrogen sul?de (H2S), NO2 and particulate material (2.5 and 10 µm). 
Furthermore, similar to the study proposed in [37], in this work all data 
have been stored in a non-relational database and processed in a private cloud 
computing infrastructure.
174 B. Guanochanga et al. 
3 Experimental Setup 
The general architecture of the real-time air pollution monitoring system is illus-trated 
in Fig. 1. The system has been divided into three layers. First, the Sensors 
layer is formed by the sensor nodes (SN) connected by Arduino R3 boards located 
in a distributed manner and the Gateway node, consisting of a Raspberry Pi 
board, forming a WSN. The sensor nodes send the polluting gas measurement 
information to the corresponding Gateway node wirelessly. Second, the Gate-way 
node with Internet access sends the received information to an application 
server in the cloud computing. The information will be stored in a non-relational 
database such as MongoDB. Third, this information will be published on a Web 
page so that users would be able to access it through their Web browser and 
smartphones. 
Fig. 1. Architecture that leverages the WSN system. 
3.1 Sensor Nodes 
The electronic circuit diagram of a typical Sensor Node prototype, which depicts 
the connections made in each sensor node, is shown in Fig. 2. It consists of the 
Arduino board, the Wireless module NRF24L01, and the CO, CO2, CH4, SO2, 
H2S, NO2 and particulate material sensors. For the measurement of polluting 
gases, the modules MQ-7 (CO), MG-811 (CO2), MQ-4 (CH4), MQ-136 (SO2,
Real-Time Air Pollution Monitoring Systems Using WSN 175 
Fig. 2. Schematic electronic circuit diagram for each Sensor Node. 
and H2S) and MICS-2714 (NO2) were used. Finally, for the measurement of 
particulate material of 2.5 and 10 µm, the digital sensor HK-A5 was used. 
The CO, CH4, SO2, H2S and NO2 sensors are Metal Oxide Semiconductor 
(MOS) based sensors. This type of sensors displays a small heating element inside 
as well as an electrochemical sensor. The heater is necessary in order to ?t the 
sensor to the its proper operating conditions, since the sensitive surface of the 
sensor will react only at certain temperatures. The detection principle is based 
on the change of resistance due to incoming gas contact. The CO2 sensor is a 
chemical sensor that operates under the principle of a solid electrolyte cell. When 
the sensor is exposed to CO2, chemical reactions occur in the cell producing an 
electromotive force. The temperature of the sensor must be high enough for these 
reactions to occur. Therefore, a heating circuit was used to heat up the sensor 
to an adequate temperature. 
The MOS sensors required signal conditional circuits for converting their 
readings to voltage that will be measured by the Arduino board. Similarly, the 
CO2 module has an ampli?er circuit to improve the accuracy of the measure-ments 
since the output voltage of the sensor is relatively low. Sensor voltages 
were measured by the analog inputs of an Arduino Mega microcontroller board. 
The particulate material sensor communicates serially with the Arduino board. 
Figure 2 shows the sensor connections with the Arduino Mega board. For wire-
176 B. Guanochanga et al. 
less communication, the NRF24L01 transceiver module was used operating in 
the 2.4 GHz ISM band. Since the Arduino board lacks of enough connections for 
powering the sensor modules, a shield-type board was designed for connecting 
the board with the sensors and the wireless extension module. The Gateway 
node consists of a Raspberry Pi board, and a NRF24L01 Wireless expansion 
module. This node receives the measurements from all connected sensor nodes. 
3.2 Hardware Prototype 
The sensor node prototype was implemented inside a sealed chamber in which the 
sensor modules were placed, as shown in Fig. 3. The chamber has two air ducts, 
air is sucked by a fan to the interior and then it escapes towards the outside. 
The sensors and the wireless module were connected to the shield and to the 
Arduino board, where all the sensor node is controlled. Signals from the sensors 
were interpreted to di?erent gas concentrations according to the characteristic 
curves described in their corresponding data sheets. 
Fig. 3. Photograph of the sensor node prototype. 
3.3 Web Services 
In order to send messages from the Gateway node to the application server, the 
MQTT protocol was used with the character string and format illustrated in 
Fig. 4. The format uses the JavaScript Object Notation (JSON) type, composed 
of the sensor node ID, the IP address, date and time of measurement, latitude 
and longitude as well as measurements of the sensor. 
As a Failback method for handling errors, this string is also stored in the 
Gateway node in a ?at ?le after it has been sent to the application server through 
the SFTP protocol, with the purpose of processing it and acting as redundancy
Real-Time Air Pollution Monitoring Systems Using WSN 177 
Fig. 4. String Chain with a format based on the MQTT protocol. 
in case the MQTT message protocol fails. The application server has a MQTT 
Broker service, which represents a central node or broker server and it is respon-sible 
for managing the network by receiving messages sent from the Gateway 
nodes. 
The system has a Delay-Access process, which allows to synchronize the 
reception of messages from the processing nodes. This process will always be 
checking the status of messages to guarantee their availability and verify if the 
failback option has been performed for the node that fails to be the case. With 
the implementation of the NODE-RED service installed on the server, several 
information ?ows have been created in order to publish the measurement data 
received from the MQTT Broker on the Web, and at the same time to store them 
within a MongoDB database. Additionally, an information ?ow was performed 
through a RESTFUL Web service and a GET method, which allowed retrieving 
information from the database in order to be shared by other systems. 
Figure 5 presents the control panel information about the state of the system’s 
central node, it displays the Temperature, CPU Load, and Memory Consump-tion, 
which may allow to diagnose the status of the WSN. Figure 6, on the other 
hand, shows an example of the real-time monitoring of methane by one of the 
sensors, sent by the central node. 
4 Results and Discussion 
For the proof of concept of this monitoring solution, several pollutant measure-ments 
were taken every seven seconds. These measurements were conducted 
around three di?erent locations in Ecuador: an university campus located in the 
city of Sangolqu´i, in the southern zone of the city of Machachi, and in the “La 
Virgen Santisima” cave in Tena [38]. A total of 260 samples were obtained for 
CO, CO2, CH4, SO2, H2S, NO2 gases, and powder density of type PM2.5 and 
PM10. The obtained measurements were on the detection ranges of the gas sen-sors 
used in the prototype. Table 1 shows the sensors ranges together with the 
typical concentrations of such gases in the environment. 
Figure 7 shows the resulting CO2 measurements for the 3 locations, as it can 
be seen the concentration inside the cave is much higher than in the cities of 
Sangolqu´if and Machachi. An average of 1240 ppm was obtained inside the karstic 
cave with a standard deviation (SD) of 319 ppm. In Sangolqu´i, it was of about 
962 ppm with a SD of 112 ppm, while in Machachi an average of 794 ppm with 
a corresponding SD of 89 ppm, was obtained.
178 B. Guanochanga et al. 
Fig. 5. Control panel of the central node of the system. 
Fig. 6. Example of real-time monitoring of methane by one of the sensors.
Real-Time Air Pollution Monitoring Systems Using WSN 179 
Table 1. Types of polluting gases and measurement ranges for the sensors used 
Type Sensors used to measure air pollution 
Polluting gas Sensor Range Reference 
Carbon monoxide, CO MQ-7 20–200 ppm Ecua. Stand. 
Carbon dioxide, CO2 MG-811 400–10000 ppm Ref. value 
Methane, CH4 MQ-4 200–10000 ppm Ref. value 
Sulfur dioxide, SO2 MQ-136 1–200 ppm WHO 
Hydrogen sul?de, H2Sa MQ-136 1–100 ppm OSHA 
Nitrogen dioxide, NO2 Mics-2714 0.05 to 5 ppm WHO 
PM2.5/PM10 HK-A5 0–999 ug/m3 WHO 
a 
Occupational Safety and Health Administration (OSHA), USA. 
Fig. 7. Comparison of CO2 measurement results in the three di?erent locations. 
Figure 8, on the other hand, shows the data obtained from the 2.5 micron 
particulate material. In Sangolqu´i, it reached up to 10 ug/m3, having a higher 
concentration density than Machachi with about 5 ug/m3, while for the cave 
this concentration is of about 6 ug/m3. Nevertheless, the Ecuadorian standard 
of air quality [1] speci?es as limit an average of about 50 ug/m3 in a day of 
monitoring, and 15 ug/m3 as an annual average, therefore the density of dust 
particles in the studied sectors remains within the recommended levels.
180 B. Guanochanga et al. 
Fig. 8. Measurements of PM2.5 at three di?erent locations. 
Fig. 9. Measurements of PM10 at three di?erent locations.
Real-Time Air Pollution Monitoring Systems Using WSN 181 
Finally, Fig. 9 illustrates that the particulate material measurements were 
?ner than 10 microns at the 3 locations. Hereby, the Sangolqu´ih sector presents 
again mostly data with about 12 ug/m3, being higher than the sector of Machachi 
with 6 ug/m3 and than the Amazonian Cave with about 8 ug/m3. Similarly, the 
Ecuadorian standard of air quality establishes that the annual PM10 concen-tration 
should not exceed 50 ug/m3; and the daily average should not exceed 
100 ug/m3. Therefore, the measurements obtained for the 3 locations comply 
with the values recommended by the WHO and the Ecuadorian standard of air 
quality. 
5 Conclusions and Future Work 
This paper focused on the design and implementation of a real-time Air Pol-lution 
Monitoring System based on the use of WSN under the concept of IoT 
using the infrastructure of a Cloud Computing. A three-layer architecture was 
designed and implemented with low-cost electronic hardware, such as Arduino-based 
sensor nodes as well as a Raspberry Pi-based gateway node with a low-cost 
wireless expansion module that captures the data. In addition, a Web service 
was also designed and implemented using a set of protocols and formats used to 
process the data and store them in a MongoDB Database as part of the Cloud 
infrastructure. The implemented Web graphical user interface allowed the com-munication 
with the WSN and users. Compared with other proposed solutions 
described in the literature, the solution proposed here is secure, since the same 
chain has been stored within the data processing nodes in a ?at ?le, and sent 
to the application layer by means of the SFTP, acting as a failback method, in 
order to process it and keep it as redundant in case the MQTT message protocol 
fails. 
Next steps will include the integration of the proposed solution with an ana-lytical 
data system based on big data tools, as well as performance improvements 
on the capture of the frames by using an Odroid electronic board. 
Acknowledgment. The authors would like to thank the ?nancial support of the 
Ecuadorian Corporation for the Development of Research and the Academy (RED 
CEDIA) in the development of this work, under Project Grant CEPRA-XI-2017-13. 
References 
1. World Health Organization. Ambient air pollution: A global assessment of exposure 
and burden of disease (2016) 
2. Zhi-gang, H., Cai-hui, C.: The application of Zigbee based wireless sensor network 
and GIS in the air pollution monitoring. In: 2009 International Conference on 
Environmental Science and Information Application Technology, Wuhan, pp. 546– 
549 (2009). https://doi.org/10.1109/ESIAT.2009.192 
3. Banghong, X., Yang, L., Honglei, Z., Junfeng, L.: Application design of wire-less 
sensor networks in environmental pollution monitoring. Comput. Measur. Control 
2, 003 (2009)
182 B. Guanochanga et al. 
4. Postolache, O.A., Dias Pereira, J.M., Silva Girao, P.M.B.: Smart sensors network 
for air quality monitoring applications. IEEE Trans. Instrum. Measur. 58(9), 3253– 
3262 (2009). https://doi.org/10.1109/TIM.2009.2022372 
5. Eren, H., Al-Ghamdi, A., Luo, J.: Application of Zigbee for pollution monitoring 
caused by automobile exhaust gases. In: 2009 IEEE Sensors Applications Sympo-sium, 
New Orleans, LA, pp. 164–168 (2009). https://doi.org/10.1109/SAS.2009. 
4801799 
6. Bader, S., Anneken, M., Goldbeck, M., Oelmann, B.: SAQnet: experiences from 
the design of an air pollution monitoring system based on o?-the-shelf equipment. 
In: 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks 
and Information Processing, Adelaide, SA, pp. 389–394 (2011). https://doi.org/ 
10.1109/ISSNIP.2011.6146632 
7. Liu, J.H., Chen, Y.F., Lin, T.S., Lain, D.W., Wen, T.H., Sun, C.H., Jiang, J.A.: 
Developed urban air quality monitoring system based on wireless sensor networks. 
In: 2011 Fifth International Conference on Sensing Technology, Palmerston North, 
pp. 549–554 (2011). https://doi.org/10.1109/ICSensT.2011.6137040 
8. Zhou, G., Chen, Y.: The research of carbon dioxide gas monitoring platform based 
on the wireless sensor networks. In: 2011 2nd International Conference on Arti?- 
cial Intelligence, Management Science and Electronic Commerce (AIMSEC), Deng 
Leng, pp. 7402–7405 (2011). https://doi.org/10.1109/AIMSEC.2011.6010423 
9. Yan, Z., Eberle, J., Aberer, K.: OptiMoS: optimal sensing for mobile sensors. In: 
2012 IEEE 13th International Conference on Mobile Data Management, Bengaluru, 
Karnataka, pp. 105–114 (2012). https://doi.org/10.1109/MDM.2012.43 
10. Mao, X., Miao, X., He, Y., Li, X.Y., Liu, Y.: CitySee: urban CO2 monitoring 
with sensors. In: 2012 Proceedings IEEE INFOCOM, Orlando, FL, pp. 1611–1619 
(2012). https://doi.org/10.1109/INFCOM.2012.6195530 
11. Wang, C.H., Huang, Y.K., Zheng, X.Y., Lin, T.S., Chuang, C.L., Jiang, J.A.: A self 
sustainable air quality monitoring system using WSN. In: 2012 Fifth IEEE Inter-national 
Conference on Service-Oriented Computing and Applications (SOCA), 
Taipei, pp. 1–6 (2012). https://doi.org/10.1109/SOCA.2012.6449427 
12. Devarakonda, S., Sevusu, P., Liu, H., Liu, R., Iftode, L., Nath, B.: Real-time air 
quality monitoring through mobile sensing in metropolitan areas. In: Proceedings 
of the 2nd ACM SIGKDD International Workshop on Urban Computing, p. 15, 
August 2013. https://doi.org/10.1145/2505821.2505834 
13. Kadri, A., Yaacoub, E., Mushtaha, M., Abu-Dayya, A.: Wireless sensor network 
for real-time air pollution monitoring. In: 2013 1st International Conference on 
Communications, Signal Processing, and their Applications (ICCSPA), Sharjah, 
pp. 1–5 (2013). https://doi.org/10.1109/ICCSPA.2013.6487323 
14. Kelly, S.D.T., Suryadevara, N.K., Mukhopadhyay, S.C.: Towards the Implementa-tion 
of IoT for environmental condition monitoring in homes. IEEE Sens. J. 13(10), 
3846–3853 (2013). https://doi.org/10.1109/JSEN.2013.2263379 
15. Fuertes, W., Carrera, D., Villac´is, C., Toulkeridis, T., Gal´arraga, F., Torres, J., 
Aules, H.: Distributed system as internet of things for a new low-cost, air pollution 
wireless monitoring on real time. In: IEEE/ACM 19th International Symposium 
on Distributed Simulation and Real Time Applications (DS-RT), Chengdu, China, 
pp. 58–67 (2015). https://doi.org/10.1109/DS-RT.2015.28 
16. Mansour, S., Nasser, N., Karim, L., Ali, A.: Wireless sensor network-based air 
quality monitoring system. In: 2014 International Conference on Computing, Net-working 
and Communications (ICNC), Honolulu, HI, pp. 545–550 (2014). https:// 
doi.org/10.1109/ICCNC.2014.6785394
Real-Time Air Pollution Monitoring Systems Using WSN 183 
17. Kim, J.Y., Chu, C.H., Shin, S.M.: ISSAQ: an integrated sensing systems for real-time 
indoor air quality monitoring. IEEE Sens. J. 14(12), 4230–4244 (2014). 
https://doi.org/10.1109/JSEN.2014.2359832 
18. Abraham, S., Li, X.: A cost-e?ective wireless sensor network system for indoor 
air quality monitoring applications. Procedia Comput. Sci. 34, 165–171 (2014). 
https://doi.org/10.1016/j.procs.2014.07.090 
19. Ferdoush, S., Li, X.: Wireless sensor network system design using Raspberry Pi 
and Arduino for environmental monitoring applications. Procedia Comput. Sci. 
34, 103–110 (2014). https://doi.org/10.1016/j.procs.2014.07.059 
20. Liu, S., Xia, C., Zhao, Z.: A low-power real-time air quality monitoring system 
using LPWAN based on LoRa. In: 2016 13th IEEE International Conference on 
Solid-State and Integrated Circuit Technology (ICSICT), Hangzhou, pp. 379–381 
(2016). https://doi.org/10.1109/ICSICT.2016.7998927 
21. Sugiarto, B., Sustika, R.: Data classi?cation for air quality on wireless sensor net-work 
monitoring system using decision tree algorithm. In: 2016 2nd International 
Conference on Science and Technology-Computer (ICST), Yogyakarta, pp. 172–176 
(2016). https://doi.org/10.1109/ICSTC.2016.7877369 
22. Pieri, T., Michaelides, M.P.: Air pollution monitoring in lemesos using a wireless 
sensor network. In: 2016 18th Mediterranean Electrotechnical Conference (MELE-
CON), Lemesos, pp. 1–6 (2016). https://doi.org/10.1109/MELCON.2016.7495468 
23. Boubrima, A., Bechkit, W., Rivano, H.: Optimal WSN deployment models for 
air pollution monitoring. IEEE Trans. Wirel. Commun. 16(5), 2723–2735 (2017). 
https://doi.org/10.1109/TWC.2017.2658601 
24. Pavani, M., Rao, P.T.: Real time pollution monitoring using Wireless Sensor Net-works. 
In: 2016 IEEE 7th Annual Information Technology, Electronics and Mobile 
Communication Conference (IEMCON), Vancouver, BC, pp. 1–6 (2016). https:// 
doi.org/10.1109/IEMCON.2016.7746315 
25. Pavani, M., Rao, P.T.: Urban air pollution monitoring using wireless sensor net-works: 
a comprehensive review. Int. J. Commun. Netw. Inf. Secur. (IJCNIS) 9(3) 
(2017) 
26. Hojaiji, H., Kalantarian, H., Bui, A.A.T., King, C.E., Sarrafzadeh, M.: Temper-ature 
and humidity calibration of a low-cost wireless dust sensor for real-time 
monitoring. In: 2017 IEEE Sensors Applications Symposium (SAS), Glassboro, 
NJ, pp. 1–6 (2017). https://doi.org/10.1109/SAS.2017.7894056 
27. Jaladi, A.R., Khithani, K., Pawar, P., Malvi, K., Sahoo, G.: Environmental mon-itoring 
using Wireless Sensor Networks (WSN) based on IOT. Int. Res. J. Eng. 
Technol. (IRJET) 4, 1371–1378 (2017) 
28. Sivamani, S., Choi, J., Bae, K., Ko, H., Cho, Y.: A smart service model in green-house 
environment using event-based security based on wireless sensor network. 
Concurrency Comput. Pract. Exp. 30, 1–11 (2018). https://doi.org/10.1002/cpe. 
4240 
29. Yadav, M., Sethi, P., Juneja, D., Chauhan, N.: An agent-based solution to energy 
sink-hole problem in ?at wireless sensor networks. In: Next-Generation Networks, 
vol. 638, pp. 255–262. Springer, Singapore (2018). https://doi.org/10.1007/978- 
981-10-6005-2-27 
30. Aznoli, F., Navimipour, N.J.: Deployment strategies in the wireless sensor net-works: 
systematic literature review, classi?cation, and current trends. Wirel. Pers. 
Commun. 95, 819–846 (2017). https://doi.org/10.1007/s11277-016-3800-0
184 B. Guanochanga et al. 
31. Xu, Y., Liu, F.: Application of wireless sensor network in water quality monitoring. 
In: 2017 IEEE International Conference on Computational Science and Engineering 
(CSE) and IEEE International Conference on Embedded and Ubiquitous Comput-ing 
(EUC), Guangzhou, pp. 368–371 (2017). https://doi.org/10.1109/CSE-EUC. 
2017.254 
32. Yu, J., Wang, W., Yin, H., Jiao, G., Lin, Z.: Design of real time monitoring system 
for rural drinking water based on wireless sensor network. In: 2017 International 
Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an, 
pp. 281–284 (2017). https://doi.org/10.1109/ICCNEA.2017.102 
33. Yang, J., Zhou, J., Lv, Z., Wei, W., Song, H.: A real-time monitoring system 
of industry carbon monoxide based on wireless sensor networks. Sensors 15(11), 
29535–29546 (2015) 
34. Nikhade, S.G.: Wireless sensor network system using Raspberry Pi and Zigbee 
for environmental monitoring applications. In: 2015 International Conference on 
Smart Technologies and Management for Computing, Communication, Controls, 
Energy and Materials (ICSTM), pp. 376–381 (2015) 
35. Delamo, M., Felici-Castell, S., P´erez-Solano, J.J., Foster, A.: Designing an open 
source maintenance-free environmental monitoring application for wireless sensor 
networks. J. Syst. Softw. 103, 238–247 (2015) 
36. Moltchanov, S., Levy, I., Etzion, Y., Lerner, U., Broday, D.M., Fishbain, B.: On the 
feasibility of measuring urban air pollution by wireless distributed sensor networks. 
Sci. Total Environ. 502, 537–547 (2015) 
37. Chen, Z., Hu, C., Liao, J., Liu, S.: Protocol architecture for wireless body area 
network based on nRF24L01. In: 2008 IEEE International Conference on Automa-tion 
and Logistics, Qingdao, pp. 3050–3054 (2008). https://doi.org/10.1109/ICAL. 
2008.4636702 
38. Constantin, S., Toulkeridis, T., Moldovan, O.T., Villacis, M., Addison, A.: Caves 
and karst of Ecuador - state-of-the-art and research perspectives. Physical Geog-raphy 
in press (2018). https://doi.org/10.1080/02723646.2018.1461496
A Multi-agent Model for Security Awareness 
Driven by Home User’s Behaviours 
Farhad Foroughi(?) 
and Peter Luksch 
Institute of Computer Science, University of Rostock, Rostock, Germany 
{farhad.foroughi,peter.luksch}@uni-rostock.de 
Abstract. Computer users are limited to perform multitask operations and 
processing information. These limitations a?ect their decision and full attention 
on security tasks. The majority of cybercrimes and frauds including e?ective 
security decisions and practising security management are related to human 
factors even for experts. Information Security awareness and e?ective home user 
training depend on concrete information and accurate observation of user behav- 
iours and their circumstances. Users’ awareness and consciousness about security 
threats and alternatives motivate them to take proper actions in a security situa- 
tion. This research proposes a multi-agent model that provides security awareness 
based on users’ behaviours in interaction with home computer. Machine learning 
is utilized by this model to pro?le users based on their activities in a cloud infra- 
structure. Machine learning improves intelligent agent accuracy and cloud 
computing makes it ?exible, scalable and enhances performance. 
Keywords: Home user’s behaviour · Security awareness 
Intelligent multi-agent model · User pro?ling 
1 Introduction 
Computer users are limited to perform multitask operations and processing information. 
These limitations a?ect their decision and full attention on security tasks. Two signi?- 
cant factors to choose the best action are individual perception climate and self-e?cacy 
[1, 2]. 
There is a wide range of home computer usage with di?erent types of users. More- 
over, research and study over the home computer security are challenging because there 
is no canonical and speci?c de?nition of home computer user. A home user may use a 
computer for shopping and banking and other normal daily tasks. The user could be 
students who use the computer for learning purposes and use educational software. The 
age and gender also may a?ect the using of a computer at home. 
According to these conditions and contexts, users’ information security behaviour 
is very dynamic and changeable. The di?erences between users a?ect their decisions to 
support security or often ignore it [3]. In addition, Information Technology brings new 
technology in houses, and the focus of security solutions is also technological. The 
majority of the cybercrimes and frauds even for experts are related to human factors 
including e?ective security decisions and practicing security management [4, 5]. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 185–195, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_15
Byrne et al. [6] analysis presents that computer knowledge and expertise a?ects the 
importance of new threats. For example, integrity perception is signi?cant for users with 
extensive knowledge. They also provide evidences that users ignore privacy settings to 
follow their habits. 
Information Security awareness and e?ective home user training depend on concrete 
information and accurate observation of user behaviours and their circumstances. 
Without this information, it would be di?cult to provide e?ective advice or create proper 
policy. In additions, as long as individuals fail to provide secure behaviour and interact 
with computer safely, other relevant organisations such as government, ?nancial insti- 
tutes and shopping markets that provided online services could be in danger and at risk. 
This paper proposes a multi-agent model to provide security awareness and training 
material based on users’ behaviours and home computer interactions. This model uses 
machine learning on a cloud platform to analyse behaviours in real time or very close 
to that. 
In Sect. 2, signi?cant human factors that in?uence user’s decision in a security 
situation are discussed. Section 3 introduces the required characteristics of an e?ective 
awareness program for home users. User pro?ling is the process of capturing user-computer 
interaction to model user’s behaviours introduced in Sect. 4. Finally, Sect. 5 
proposes a multi-agent model and discusses each element of the model. 
2 Human Factors 
Psychologists and cognitive scientists say personal behaviours are linked to the person- 
ality pro?le. Some factors like age or age group, gender, personal interests and hobbies, 
occupation, education, and history of actions are included in the personality pro?le. 
This is important to understand users’ online activities and behaviours as well as 
their personality and occupation (or computer role) to be able to provide appropriate 
security awareness and training. 
By providing systematic awareness and guidance for all users sharing a computer or 
home network based on their behaviours, this e?ect could become a security culture. 
“Every [security] system is inadequate if there is no security culture shared by the whole 
sta?” [7]. 
The information security culture for home users is an important element to provide 
an e?ective and continues secure, safe behaviour [7]. The information security respon- 
sibility as well as physical security of users are the essential pieces of a comprehensive 
way to deal with information security management. Metalidou has categorised all human 
factors related to this aspect in four groups. These groups are (1) user interfaces of 
security-related systems; (2) information security management concerns for risk, busi- 
ness processes and ?nance; (3) organisational issues related to information security 
behaviour, and (4) counterproductive computer usage [5]. 
It is ?nally individuals who make decisions in any information security implemen- 
tation, but most of the home users’ security decisions are limited to their technological 
solutions. 
186 F. Foroughi and P. Luksch
Having improved security controls does not mean they are free from risk. West 
proves that individuals maintain an appropriate level of risk and danger [2]. In the home 
security context, it means a security control implementation or improvement will 
increase the users’ risky behaviour. 
Technical security controls in?uence the users’ actions by providing security func- 
tions and mechanisms, but human factors also a?ect individual’s decisions. Human 
factors are including motivation, knowledge, attitude, values and so on. The quality and 
accuracy of risk perception impacts users’ awareness, consciousness and behaviour and 
motivate them to take proper action in an information security management system [8]. 
In addition, any awareness program and education plan depend on the views of facili- 
tating the people to make relevant and e?ective security choices and thus achieve greater 
suitable information security consequences [9]. 
2.1 Security Awareness Program 
When a home user is in a security situation or having a risky behaviour, having appro- 
priate skills or knowledge against the threat would lead the user to play an active role. 
The con?dence based on appropriate solutions will push users to choose adaptive 
behaviours more than maladaptive actions [10]. 
Awareness training generally includes security situations that may occur, the risks 
confronted, fundamental methods of security, how to build e?ective security behaviour, 
and recommended resources and support in a security scenario. 
Within the home security context, users are able to decide whether and how to carry 
out security actions because their options and alternatives are voluntary and subjective. 
To follow the decision-making process and to analyse the situation, researchers 
recognise ?ve factors that in?uence users’ decisions in computer security situations. 
These factors are [3]: 
(1) Recognition, awareness and consciousness of safe practices. 
(2) Recognition, awareness and consciousness of possible negative consequences of 
unsafe actions. 
(3) Recognition, awareness and consciousness of possible supportive resources for safe 
practices. 
(4) Probably and likelihood of negative consequences. 
(5) Cost of consequences. 
These ?ve factors could be categorised in two general divisions: (1) Awareness and 
knowledge of risks as well as consequences. (2) Awareness and understanding of defen- 
sive and protective measures [3]. Therefore, to provide an e?ective security awareness 
program, it is signi?cant to support human factors that in?uence users’ decisions. 
Home users like other individuals are unmotivated and have a limited capacity for 
information processing speci?cally in multitasking scenarios. Users need the motivation 
to improve their capabilities. 
When a user has to evaluate alternative options in a situation to make the best choice 
and decision, results which are actually abstract in nature such as security and protection 
A Multi-agent Model for Security Awareness 187
are likely to be less persuasive compared to those that are concrete. Consequently, users 
need to have a concrete understanding of security de?nitions [11]. 
In a typical and normal learning position, a behaviour is formed by positive rein- 
forcement whenever take action “right”. Hence, users need feedback and learning form 
speci?c and particular security-related decisions and not just common protection or 
dangerous choices. 
The protection and safety measure gain is generally conceptual but negative e?ects, 
and consequences are stochastic, costly and immediate. Accordingly, users should be 
able to evaluate any security and risk trade-o?. 
Furthermore, security bene?t and gain are usually intangible or conceptual, but in 
the opposite, security cost or losses values are more probable [12]. Because of this, cost 
and loss perception are more important in?uence factors than gain and bene?t when 
individuals try to evaluate security risks. However, Tversky and Kahneman proved that 
individuals are a lot more likely to stay away risk when options are provided as bene?ts 
and take risk when alternatives are presented as losses [13]. They also con?rmed that 
when users perceive a gain and loss to have the very same bene?t, the loss is considerably 
more motivating in choosing alternatives (Fig. 1). For example, online shoppers respond 
more properly to the understanding of likelihood and chance of negative threats than to 
awareness of the threats themselves [3]. 
Fig. 1. Losses carry more value compared to gain when both are perceived as equal. 
The fear manipulation will in?uence the perceived intensity of the risk and threat. 
In addition, an increase of fear appeals will improve the chance and likelihood of a threat 
to be realised. Rewards could be an individual pleasure or a ful?lment by peers. The 
social acceptance might also be a kind of rewards. Fear awaking could adjust both threat 
(risk) perception and threat (risk) probability. Therefore, providing threat (risk) evalu- 
ation is considered to prevent maladaptive reactions [14]. 
As it is discussed, users usually feel they are at less risk than others. Based on these 
?ndings, it is almost always necessary to improve and enhance users’ risk perception 
awareness to increase their security and protection compliance. 
Raising risk perception and understanding might also be corporate and comprehen- 
sive to decrease the probability and chance of security policy violation. It means home 
user security awareness should be assembled to produce su?cient information and 
188 F. Foroughi and P. Luksch
knowledge and support all family members or individuals who share a computer to 
eliminate security risks. 
Clearly, in case home users have to take extra measures and steps to increase their 
level of protection, it should not be di?cult, and the cost of applying and employing 
security controls should be reduced as much as possible with e?cient support. 
3 User Pro?ling 
Computer security awareness and training has to be personalised to produce the home 
user with a su?cient and e?ective learning experience lined up with his/her day-to-day 
occupation, activities, time availability, interests, generation and connection with owned 
technology. 
The capability of data analysis to correlate information and data from a broad range 
of sources across substantial time periods could bring out a clear and e?cient under- 
standing of home users’ activities and behaviours. By using this analysis concerning big 
data sources, makes security awareness program able to categories users in di?erent risk 
groups and provide the appropriate information and training. 
For this reason, recognising user behaviour in real time is an important element of 
providing relevant information and help to take suitable action or decision. It is possible 
to employ user modelling to make this process automatic by using an application or 
intelligent agent [15]. It is proved that the user should be realised in a variety of contexts. 
Therefore, a context-aware system should be utilised to identify user context in a certain 
time period [16]. This aspect drives the idea of using data science and machine learning 
to automate the user behaviour analysis to provide a data-driven decision-making model. 
A home user could be recognised in cyberspace by a digital pro?le [3]. A research 
by Weber et al. proves that a user pro?le presents (1) the user’s behavioural patterns or 
preferences, (2) the user’s characteristics, (3) the user’s skills, and (4) the cognitive 
process that a user chooses an action [17, 18]. 
The primary function of user pro?ling is capturing user’s information about interest 
domain. This information may be used to understand more about individual’s knowledge 
and skills and to improve user satisfaction or help to make a proper decision. The user 
pro?le consists of all information about a user that could be known by the system. 
User pro?ling is usually either knowledge-based or behaviour-based. The knowl- 
edge-based strategy uses statistical models to categorise a user in the closest model based 
on dynamic attributes. 
The behaviour-based strategy employs the user’s behaviours and actions as a model 
to observe bene?cial patterns by applying machine learning techniques. Real-time user 
behaviour analysis requires on-line monitoring to predict users actions. These behav- 
iours could be extracted through monitoring and logging tasks [19]. Batch analysis or 
o?-line monitoring could be carried out in time intervals or after a task has been ?nished 
by a user in accordance with statistical parameters of user actions. Using online and o?- 
line monitoring modes together provide both statistical and dynamical analysis of user 
actions [20]. 
A Multi-agent Model for Security Awareness 189
Generally, a user pro?ling begins with user’s data retrieval and data collection. 
Collecting user information (actions details) is the ?rst step to create a user model. It 
includes “what” information required and “how” to collect relevant information. Data 
gathering model could be explicit or implicit [21]. 
Explicit model means the computer user should be encouraged to provide a speci?c 
amount of information, but just a few number of users participate in such a process and 
furthermore, the provided information also has poor quality. Another signi?cant point, 
if keeping data up to date is necessary, this data collection model becomes more chal- 
lenging [22]. 
Implicit data collection model is a “silent” process to collect information through 
analysing observed users’ actions and reactions in a computer interaction environ- 
ment [22]. 
A hybrid pro?ling model considers both static characteristics and features of a user 
and also, tries to retrieve the behavioural information about the user. This strategy creates 
a more e?cient pro?le and maintains the accuracy of user data by keeping it up to date. 
A major attribute of discovery through observation is user’s change adaptation. It 
means, when user’s interest, preferences, habits and goals are changed over the time, 
these changes could be re?ected in the user pro?le to keep it updated. This attribute is 
possible by using pro?ling techniques which adapt and adjust the content of user pro?les 
when new observation data arrived. User feedback could also play an essential role in 
this particular process [23]. 
Collecting a wide range of user’s data creates speci?c challenges and needs an infra- 
structure to support several requirements including security, privacy and performance. 
The data collection should be transparent as much as possible with minimum user inter- 
action. It also should not make the limitation on system computing or network perform- 
ance. Because the behaviour analysis model may require a di?erent type of data over a 
time period, data collector architecture should be ?exible to cover various sensor types 
and technologies on di?erent platforms. 
4 Multi-agent Model 
Multiple heterogeneous software entities (agents) that interact with each other directly 
or indirectly in a complex system with common or common or con?icting goals build 
a multi-agent system. [24]. A direct communication might be via messaging, and indirect 
communication could be through making an e?ect on the environment which the other 
agent(s) can sense it [25]. 
An agent provides noticeable characteristics including autonomous, social (interact 
with other agents), reactive, proactive, trustworthiness, rationality and learning. Reac- 
tivity character makes agents able to provide ongoing interaction with the system. 
Agents are proactive and rational which develops agent behaviour in accordance with 
its goal [20]. 
190 F. Foroughi and P. Luksch
The environment that home user interacts with a computer is continual, observable, 
dynamic, accessible and non-deterministic. This complex environment requires a multi-agent 
system to provide an infrastructure which agents could interact with each other to 
achieve the system goal. 
An intelligent agent is an ideal rational agent that provides actions to reach the highest 
level of performance measure by using provided evidence and built-in knowledge. The 
performance measure determines principles of success but should be carefully de?ned 
to concern con?icting criteria. 
A rational agent is an agent that performs right actions to achieve its goal as 
successful as possible. It means that a rational agent has to be reasonable, sensible and 
provides good judgment. Rationality depends on performance measures (determines the 
level of success), agent perception from the past (prior knowledge), agent understanding 
about the environment (perception sequence) and possible actions [26]. The perform- 
ance measure de?nes the criterion of success for an agent. 
An intelligent agent is based on learning model to run the inference engine. Feature 
extraction block receives information from sensors to extract useful features and then it 
will send them to the inference engine. The trained inference engine uses this information 
based on learning model to predict a result. The learning model is constructed by a 
machine learning algorithms [27]. The inference engine provides a decision and sends 
it to the actuator. The actuator is responsible for performing necessary action(s). 
Machine learning, stored knowledge and condition rules are typical techniques to 
make an agent intelligent. Machine learning imparts intelligence by using labelled data 
and training process. This approach makes it possible to extract patterns and relation- 
ships to predict unknown data to solve the problem [27]. 
A distributed multi-agent architecture could supply the ?exibility of providing 
required functions in the necessary locations. It also requires less programming chal- 
lenges and system control by employing global objectives to supply necessary knowl- 
edge and experiences that make agents able to solve complex problems by more 
autonomy [28]. Distributed multi-agent system by using cloud computing power is a 
combination of distributed independent, autonomous and incomplete agents that work 
together to address a complex global issue with no need of centralised system control 
[28]. In this cloud architecture, data is decentralised, and computing is asynchro- 
nous [29]. 
This architecture lets devices implement more features with limited storage and 
processing capabilities. 
The proposed model (Fig. 2) tries to develop an architecture by integrating cloud 
computing approach and multi-agents architecture to provide a dynamic, ?exible, robust 
and scalable intelligent system. 
A Multi-agent Model for Security Awareness 191
Fig. 2. Proposed multi-agent model. 
In this architecture, the user interface (UI) agent directly interacts with user and 
computer to collect required data through independent sensors and also provides relevant 
information including warnings, or training materials. The UI agent has sensor modules 
including di?erent independent sensors which are responsible for capturing users’ 
actions from a wide range of resources such as browser history, system settings, ?le 
system, network interfaces and user data (via an explicit method). It extracts relevant 
features and also generates data logs. These logs consist of personal and private details 
about a user. It is necessary to provide appropriate security measures to keep them safe 
and con?dential. For this reason, the UI requires two types of data storages to create and 
maintain (update) information. 
Online storage to store user data and pro?le: For security purpose, a Secured Virtual 
Di?used File System (SVDFS) by using private cloud is proposed. The data exchange 
between UI agent and cloud is also protected by a secure communication protocol using 
PKI. 
O?ine storage to store log ?les and user activities for further analysis or until trans- 
mitted to the server. These log ?les are stored in an encrypted container with password 
protection. 
The user pro?ler (UP) agent receives extracted features from UI and uses machine 
learning to process information and create (and update) user pro?le. The UP agent uses 
cloud computing to provide a dynamic, distributed and scalable service. 
The risk evaluator (RE) agent receives user pro?le information from the UP agent 
and also recent threats and vulnerabilities from the threat ?nder (TF) agent. According 
to the user pro?le which describes the necessary level of security and relevant security 
measures, the RE agent analyses user’s actions by utilising machine learning techniques 
and provides a risk level and related threat’s information to the awareness provider agent. 
192 F. Foroughi and P. Luksch
The awareness provider (AP) agent uses an awareness and security control repository 
to create appropriate awareness and training material covering threats, vulnerabilities, 
risk level and required protective or preventive actions. This information will be sent to 
the UI agent to be presented by a suitable method through visualiser modules. 
Figure 3 illustrates the layered architecture of the multi-agent model and the commu- 
nication links between agents. 
Fig. 3. Layered architecture of proposed multi-agent model. 
5 Conclusion 
Home users like other individuals are unmotivated and have a limited capacity for 
information processing in the security situations. Users’ awareness and consciousness 
about security threats and alternatives motivate them to take proper action in an infor- 
mation security management system. An e?ective security awareness requires a concrete 
understanding of security de?nitions, and learning form speci?c security-related deci- 
sions. It should also provide security control evaluation and risk trade-o? when loss 
perception and cost is considerably more motivating in choosing alternatives. Risk 
perception awareness is a signi?cant factor to increase user’s security and protection 
compliance. This research has proposed a multi-agent model that provides security 
awareness based on users’ behaviours in interaction with home computer. Machine 
learning is utilized by this model to pro?le users based on their activities in a cloud 
infrastructure. Machine learning improves intelligent agent accuracy and cloud 
computing makes it ?exible, scalable and enhances performance. 
This research is limited to cover only home users’ requirements and awareness 
program is based on security risks which might be occurred in accordance of general 
users’ activities. Moreover, it is significant to handle a huge amount of data in an 
online mode and process data streams in real time. Therefore, there are many 
machine learning classifiers based on Neural Network (NN), Bayesian learnings, 
Decision trees and, statistical analysis tools which should be trained and tested in 
A Multi-agent Model for Security Awareness 193
accordance with samples which will be collected through a volunteer program to find 
best possible online classifier. 
In this ?eld, the next challenge is to identify required monitoring sensors to observe 
users’ behaviour and provide a comparison between machine learning algorithms to 
achieve the best performance. 
References 
1. Hazari, S., Hargrave, W., Clenney, B.: An empirical investigation of factors in?uencing 
information security behavior. J. Inf. Priv. Secur. 4(4), 3–20 (2008) 
2. West, R.: The psychology of security. Commun. ACM 51(4), 34–40 (2008) 
3. Howe, A.E., et al. The psychology of security for the home computer user. In: 2012 IEEE 
Symposium on Security and Privacy (SP). IEEE (2012) 
4. Wash, R.: Folk models of home computer security. In: Proceedings of the Sixth Symposium 
on Usable Privacy and Security. ACM (2010) 
5. Metalidou, E., et al.: The human factor of information security: unintentional damage 
perspective. Proc. Soc. Behav. Sci. 147, 424–428 (2014) 
6. Bryant, P., Furnell, S., Phippen, A.: Improving protection and security awareness amongst 
home users. Adv. Netw. Comput. Commun. 4, 182 (2008) 
7. Malcolmson, J.: What is security culture? Does it di?er in content from general organisational 
culture? In: 43rd Annual 2009 International Carnahan Conference on Security Technology 
(2009) 
8. Albrechtsen, E.: A qualitative study of users’ view on information security. Comput, Secur. 
26(4), 276–289 (2007) 
9. Mai, B., et al.: Neuroscience Foundations for Human Decision Making in Information 
Security: A General Framework and Experiment Design, in Information Systems and 
Neuroscience, pp. 91–98. Springer, Berlin (2017) 
10. Milne, G.R., Labrecque, L.I., Cromer, C.: Toward an understanding of the online consumer’s 
risky behavior and protection practices. J. Consum. A?airs 43(3), 449–473 (2009) 
11. Borgida, E., Nisbett, R.E.: The di?erential impact of abstract vs. concrete information on 
decisions. J. Appl. Soc. Psychol. 7(3), 258–271 (1977) 
12. Zurko, M.E., Simon, R.T.: User-centered security. In: Proceedings of the 1996 Workshop on 
New Security Paradigms. ACM (1996) 
13. Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. J. Bus. 59, S251– 
S278 (1986) 
14. Mckenna, S.P., Predicting health behaviour: research and practice with social cognition 
models. In: Conner, M., Norman, P. (eds.) Open University Press, Buckingham (1996). 230 
p. Elsevier, ISBN 0-335-19320-X 
15. Iglesias, J.A., et al.: Creating evolving user behavior pro?les automatically. IEEE Trans. 
Knowl. Data Eng. 24(5), 854–867 (2012) 
16. Dino?, R., et al. Learning and managing user context in personalized communications 
services. In: Proceedings of the International Workshop in Conjunction with AVI 2006 on 
Context in Advanced Interfaces. ACM (2006) 
17. Weber, E.U., Blais, A.R., Betz, N.E.: A domain-speci?c risk-attitude scale: measuring risk 
perceptions and risk behaviors. J. Behav. Decis. Mak. 15(4), 263–290 (2002) 
18. Iglesias, J.A., Ledezma, A., Sanchis, A.: Evolving systems for computer user behavior 
classi?cation. In: 2013 IEEE Conference on Evolving and Adaptive Intelligent Systems 
(EAIS). IEEE (2013) 
194 F. Foroughi and P. Luksch
19. Middleton, S.E., Shadbolt, N.R., De Roure, D.C.: Ontological user pro?ling in recommender 
systems. ACM Trans. Inf. Syst. 22(1), 54–88 (2004) 
20. Kussul, N., Skakun, S.: Intelligent system for users’ activity monitoring in computer 
networks. In: Intelligent Data Acquisition and Advanced Computing Systems: Technology 
and Applications, IDAACS 2005. IEEE (2005) 
21. Schölkopf, B., et al.: Estimating the support of a high-dimensional distribution. Neural 
Comput. 13(7), 1443–1471 (2001) 
22. Ouaftouh, S., Zellou, A., Idri, A.: User pro?le model: a user dimension based classi?cation. 
In: 2015 10th International Conference on Intelligent Systems: Theories and Applications 
(SITA). IEEE (2015) 
23. Schia?no, S., Amandi, A.: Intelligent user pro?ling. In: Arti?cial Intelligence an 
International Perspective, pp. 193–216. Springer (2009) 
24. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and 
Logical Foundations. Cambridge University Press, Cambridge (2008) 
25. Maes, P.: Pattie Maes on software agents: humanizing the global computer. IEEE Internet 
Comput. 1(4), 10–19 (1997) 
26. Stuart, R., Peter, N.: Arti?cial Intelligence-A Modern Approach, vol. 3. California, Berkeley 
(2016) 
27. Joshi, P.: Arti?cial Intelligence with Python. Packt Publishing, Birmingham (2017) 
28. Rodríguez, S., et al.: Cloud computing integrated into service-oriented multi-agent 
architecture. In: Balanced Automation Systems for Future Manufacturing Networks, pp. 251– 
259. Springer, Berlin (2010) 
29. Wooldridge, M.: An Introduction to Multiagent Systems. Wiley, London (2009) 
A Multi-agent Model for Security Awareness 195
Light Weight Cryptography for Resource 
Constrained IoT Devices 
Hessa Mohammed Zaher Al Shebli(?) 
and Babak D. Beheshti(?) 
New York Institute of Technology, Old Westbury, NY 11568, USA 
Babak.beheshti@nyit.edu 
Abstract. The Internet of Things (IoT) is going to change the way we live 
dramatically. Devices like alarm clocks, lights and speaker systems can inter- 
connect and exchange information. Billions of devices are expected to be inter- 
connected by the year 2020, thus raising the alarm of a very important issue 
‘security’. People have to be sure that their information will stay private and 
secure, if someone hacked into your medical device (hand watch) he will be able 
to view all your medical records, and he could be able to use it against you. If one 
device is hacked your entire network is going to be compromised. Transmitting 
your information securely between IoT devices using traditional crypto algo- 
rithms are not possible because those devices have limited energy supply, limited 
chip area and limited memory size; because of those constraints a new type of 
crypto algorithm came into place: the light weight crypto algorithms. As the name 
implies those algorithms are light and can be used in those devices with low 
computational power. In this paper, we start by describing some of the heavy 
ciphers. We also highlight some lightweight ciphers and the attacks known against 
them. 
Keywords: Light weight cryptography · IoT devices · Grain cipher 
Present cipher · Hight cipher 
1 Introduction 
Security is the key concern on the technology world. With the rapid increase in the 
number of devices connecting to the internet these days, transmitting con?dential infor- 
mation in a secure manner is what people try to achieve when they use encryption. 
Encryption is the term used to hide the context of the original message (using an encryp- 
tion algorithm and a key) so only the intended user can decrypt it and read it. Figure 1 
illustrates the basic ?ow of information through encryption. 
Encryption algorithms are divided into two main categories, symmetric algorithms 
and asymmetric algorithms; where the symmetric algorithms mean using only one key 
to perform both the encryption and decryption process. While the asymmetric algorithms 
use two keys (public and private) one to encrypt and the other to decrypt. 
Symmetric algorithms are also divided into two main groups stream ciphers and 
block ciphers, from their name indicates stream ciphers encrypts a bit by bit, while the 
block cipher encrypts a bunch of bits together. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 196–204, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_16
Fig. 1. Encryption process. 
In this paper we will start by introducing the general categories of symmetric key 
and assymetric key crypto algorithms. We will then proceed to survey some leading 
light weight algorithms. For each algorithm we will introduce the fundamental structure, 
followed by attacks studied for each one. At the end we will present a comparison of 
performance parameters for these algorithms. 
1.1 Symmetric Algorithms – Block Cipher (AES) 
We will take AES as an example of the symmetric algorithm since it’s the most used 
algorithm these days. AES stands for Advanced Encryption Standard, also known as 
Rijndael, its original name [10]. AES encrypts a ?xed block size of 128 bits, and has a 
key size of 128, 192, or 256 bits. AES was developed to substitute DES which was 
vulnerable to brute-force attacks. AES encrypts blocks of data in rounds depending on 
the key size, for example the 256 bits key, have 14 rounds (below table shows number 
of rounds for each key size) [8]. The relation between number of rounds and the key 
size is illustrated in Table 1. 
Table 1. Number of rounds (R) in relation to cipher key size 
No. rounds Key size 
10 128 
12 192 
14 256 
For encryption, each round of processing includes four steps, byte substitution, shift 
rows, mix columns, and add round key. All rounds are identical except for the last one. 
One round is shown in Fig. 2. 
The byte substitution step simply means replacing bytes with bytes from a 16 × 16 
lookup table. Shift rows step consists of shifting the row state to the left, ?rst row is not 
shifted while second row is shifted one byte, third row shifted two bytes and fourth row 
shifted three bytes to the left (Fig. 3). 
Light Weight Cryptography for Resource Constrained IoT Devices 197
Fig. 3. Shift rows. 
AES is still secure against all attacks. AES requires lots of power and chip area to 
do encryption and decryption process. While this is not an issue in devices like work- 
stations and laptops it i’s a concern for small devices that have to save power, and have 
limited chip area. For AES with a key size of 128 bits 3,400 GE1 chip area is required 
while 2000 GE chip area is allocated for security in IoT device [9]. 
1.2 Asymmetric Algorithms – RSA 
RSA is another crypto algorithm that is widely used; it is an asymmetric algorithm that 
uses two di?erent keys but mathematically linked, one to encrypt (the public key) and 
one to decrypt (the secret key). RSA got its name from the initial letters of the three 
scientists who ?rst publicly described the algorithm in 1977 (Ron Rivest, Adi Shamir, 
and Leonard Adleman). 
There are two steps for the RSA algorithm: 
1. Key generation. 
2. RSA encryption, decryption. 
1 
A gate equivalent (GE) stands for a unit of measure which allows to specify manufacturing-technology-
independent complexity of digital electronic circuits. For today's CMOS technol- 
ogies, the silicon area of a two-input drive-strength-one NAND gate usually constitutes the 
technology-dependent unit area commonly referred to as gate equivalent. A speci?cation in 
gate equivalents for a certain circuit re?ects a complexity measure, from which a corresponding 
silicon area can be deduced for a dedicated manufacturing technology (https://en.wiki- 
pedia.org/wiki/Gate_equivalent). 
Fig. 2. AES encryption round steps. 
198 H. M. Z. Al Shebli and B. D. Beheshti
In the key generation step (generating a public key and a corresponding private key), 
two large prime numbers have to be generated. After that we have to generate modulus 
(n) by multiplying those two prime numbers. Generating the modulus is easy but facto- 
rizing the two prime numbers that we used is considered hard even with today’s super 
computers. After that we need to calculate the f(n) using the formula: f(n) = (p-1)(q 
-1). The public key (expressed as e) is then generated by choosing a prime number in 
the range between 3 and ?(n). The ?nal public key is a pair of e and n; represented as 
(e, n). The private key (d) is the multiplicative inverse of the public key with respect to 
?(n), and is also represented as pair (d, n). For the encryption this formula is used: 
F(m, k) = mk mod n 
where k is the public key or the private key. 
Asymmetric algorithms (AKA public key algorithms) relies on mathematical oper- 
ations like factorization to be e?ective. These operations needs lots of resources to 
complete and requires large hardware footprint, making it too expensive for IOT devices. 
2 Light Weight Cryptography 
Light weight cryptography is designed to secure the communication between IOT 
devices since traditional cryptographic algorithms are not an option. IOT devices (AKA 
constrained devices) have constraints when it comes to speed, power consumption, area, 
processing, memory space and size [14]. The challenge is to reduce some of the algo- 
rithm parameters without a?ecting the total security of the algorithm. Number of rounds, 
key length and processing speed has to be reduced. 
There are two ways to design a light weight cryptographic algorithm, the ?rst one is 
to develop it from scratch like the PRESENT cipher, and the second way is to optimize 
the functionalities of an existing traditional cryptographic algorithm like AES and RSA. 
Light weight algorithms are categorized into two main categories; hardware– 
oriented and software-oriented based on the requirements of the cipher. Hardware 
oriented ciphers used when we are concerned about the number of clock cycles and chip 
size; while the software oriented ciphers are used when we are concerned about the 
memory space and power consumption. 
A standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 
of the International Organization for Standardization (ISO) and the International Elec- 
trotechnical Commission (IEC), started working on the lightweight cryptography 
project. ISO/IEC 29192 is the known standard for the lightweight cryptography. 
ISO/IEC 29192 part two and part three, speci?es block ciphers and stream ciphers 
respectively. Some light weight ciphers are introduced below. 
2.1 Grain (Stream Cipher) 
In this bit oriented synchronous stream cipher the key stream is generated independently 
from the plaintext. The stream cipher is divided into two phases: ?rst phase is initiali- 
zation the internal state using the secret key and the initialization vector [7]. Then the 
Light Weight Cryptography for Resource Constrained IoT Devices 199
phase is repeated while the state is updated and used to generate key-stream bits. There 
are two types grain v1 and grain-128. The overall algorithm block diagram is illustrated 
in Fig. 4. 
Fig. 4. Grain v1 algorithm. 
The grain v1 needs an 80-bit stream cipher and a 160 cycles and receives a 64-bit 
initialization vector. 
The grain-128 needs 128-bit stream cipher and a 256 cycles and receives a 96-bit 
initialization vector. 
Figure 4 shows a basic structure of the Grain v1 algorithm “f” and “g” are two 
polynomials(functions) of degree 80 they are used as a feedback for the feedback regis- 
ters the linear feedback and the non-linear feedback the “h” polynomial uses selected 
bits from both the feedback shift registers, bits from the NFSR register are XORed then 
added to the “h” function the output is used as a feedback to the LFSR and NFSR in the 
initialization phase (as shown in the light blue lines) and during the normal operation it 
is used as a key stream output. The output is one bit of the non-linear feedback register 
and four bits of the linear feedback register then they are supplied to the nonlinear 5- 
to-1 XOR function and the output is linearly combined with 7 bits of the linear feedback 
register and released as an output. 
2.2 Present (Block Cipher) 
PRESENT is an ultra-Lightweight block cipher that has a block length of 64 bits and 
two key lengths of 80 bits, 128 bits and 31 rounds [6]. It’s block diagram is shown in 
Fig. 5. 
Present cipher design got its characteristics from the Serpent ciphers (non-linear 
substitution layer S-box) and DES (linear permutation layer pLayer). 
There are three stages involved in PRESENT. The ?rst stage is addRoundKey; the 
second stage is sBoxLayer; the third stage is the bit permutation pLayer [6]. 
Figure 5 shows that in each round of the 31 rounds there is an XOR operation to 
introduce a round key as in ki 1 <= i <= 32 then k32 is used for post-whitening (post-whitening 
is combining the data with portions of the key to increase the security of block 
cipher), linear bitwise permutation and a non-linear substitution layer. The non-linear 
layer uses S-box of 4-bit to 4-bit which is applied in each round 16 times in parallel. 
200 H. M. Z. Al Shebli and B. D. Beheshti
The key can take 80 or 128 bits, it is stored in a key register and represented in a 
descending way as k50k49 … k0 at round “i” 64 bits round key is ki which is denoted as 
k63k62 … k0 and it consists of the leftmost 64 bits of the contents of the K register. After 
extracting the round key ki the K register is rotated by 61-bit positions to the left, the 
leftmost 4 bits are passed through the S-Box and the round counter value “i”, then the 
‘i’ is XORed with the least signi?cant bits of the round counter on the right and the 
whole operation is repeated. 
2.3 Hight (Block Cipher) 
Hight is a lightweight encryption algorithm that was proposed one year before Present 
cipher. It consists of 32 rounds. Hight makes use of XOR, addition Modulo 256 opera- 
tions which allow it to have good performance in hardware [1]. Figure 6 shows the cipher 
block diagram. 
Hight has a block size of 64-bits and a 128-bits key length. The encryption starts 
with an Initial Transformation (IT) that is applied to plaintexts together with input 
whitening keys (WKs) [11]. In the last round a Final Transformation (FT) is applied to 
the output of the last round together with output whitening in order to obtain the Cipher 
texts [1]. 
The plain text is divided into 8 bytes denoted as P = P7, P6, … P0 the same thing for 
the cipher text its divided into 8 bytes denoted as C = C7, C6, … C0 the 64-bit intermediate 
values are represented as Xi = Xi,7, Xi,6, … Xi,0 the master key is divided into 16 bytes 
denoted as MK = MK15, MK14, … MK0 [13]. The Key Schedule has 2 algorithms one 
is to generate whitening-key “WK” and the second is to generate subkey “SK”. The 
operation uses 8 whitening-keys, 4 for the initial transformation and another 4 for the 
?nal transformation. 128 subkeys are generated throughout the process 4 used at each 
round. 
Fig. 5. Present cipher. 
Light Weight Cryptography for Resource Constrained IoT Devices 201
At the initial transformation the ?rst intermediate value X0,0, P0 are into an addition 
and a modules of 28 with the whitening-key WK0 then the 2nd intermediate value X0,1, 
P1, then the 3rd intermediate value X0,2, P2 are into addition and a modules of 28 with the 
whitening-key WK1 this is repeated until the X0,7, P7 [12]. 
At each round the Xi is turned into Xi+1 in example Xi+1,0 = Xi,7 XOR (auxiliary 
function of(Xi,6) into addition and a modules of 28 with the subkey SK4i+3) this repeats 
for every X until X32,0 in each round the ?nal transformation is the same as the initial 
transformation repeated but with the “P” turned into the notation of cipher text which 
is “C” and WK instead from 0 to 3 it is now 4 to 7 and the X is from X32,0 to X32,7. 
3 Comparison Between the Algorithms 
Since Present and Hight are both block ciphers it’s fair to compare them to each other. 
Table 2 lists the comparison between key performance criteria between PRESENT and 
HIGHT. 
Table 2. Comparison between PRESENT AND HIGHT cipher 
Algorithm Key size Area “GE” RAM requirement (bytes) 
Present 80 1570 142 
Hight 128 3048 18 
We assume a block size of 64 bits for both algorithms. Table 1 shows that Present 
cipher does not need much area as Hight cipher does [2]. 
Fig. 6. Hight cipher. 
202 H. M. Z. Al Shebli and B. D. Beheshti
As for the stream cipher Grain Table 3 shows the area requirements. 
Table 3. Grain cipher 
Algorithm Key size Area “GE” 
Grain 80 bits 1294 
4 Attacks Against Lightweight Algorithms 
Designers of Present cipher presented some security margins for di?erential, linear and 
algebraic cryptanalysis. Since then it was discovered that 32% of Present keys (80-bit 
key size) are weak for linear cryptanalysis. In 2009 a study on the linear hull and alge- 
braic cryptanalysis was conducted for Present. The study proposed a linear attack for 
25 rounds of Present (128-bit key size) and an algebraic attack for 5 rounds of Present 
(80-bit key size). After a year of this study an attack on 25-round Present was proposed 
that can recover the 80-bit secret key with 262.4 data complexity [3]. 
In linear cryptanalysis an attacker tries to ?nd biased linear approximations for non-linear 
components of a cipher (e.g. an S-Box) and then use them to ?nd biased linear 
approximation for the entire cipher. One is then able to use these biased approximations 
to recover certain subkey bits. Afterwards, the remaining key bits are recovered by brute 
force [4]. 
One of the studied attacks against Present cipher is a Statistical Saturation Attack 
that takes advantages of the weakness in its di?usion layer. Present as well can be 
exploited using the Di?erential key attack [5]. 
5 Conclusion and Future Work 
Recourse constrained devices like RFID (radio-frequency identi?cation) are getting 
more and more into our lives because of their cheap prices. The need for cryptographic 
solutions is necessary. While lots of ciphers have been proposed their security has to be 
studied more and more against emerging attacks. 
In this paper we highlighted three lightweight algorithms, we compared them; we 
also touched on possible attacks for the Present cipher. 
For future work, we plan to simulate several key lightweight crypto-algorithms on 
multiple embedded platforms and pro?le their performance. These performance 
comparisons will be important to recognize each algorithm’s internal computation 
a?nity to speci?c CPU architectures. 
References 
1. Ozen, O., Varici, K., Tezcan, C., Kocair, C.: Lightweight Block Ciphers Revisited: 
Cryptanalysis of Reduced Round PRESENT and HIGHT. http://citeseerx.ist.psu.edu 
2. Bogdanov, et al.: PRESENT: An Ultra-Lightweight Block Cipher. http://lightweightcrypto.org 
Light Weight Cryptography for Resource Constrained IoT Devices 203
3. Lacko-Bartošová, L.: Algebraic Cryptanalysis of Present Based on the Method of Syllogisms. 
www.sav.sk 
4. Bulygin, S.: More on linear hulls of PRESENT-like ciphers and a cryptanalysis of full-round 
EPCBC–96. http://eprint.iacr.org 
5. Collard, B., Standaert, F.X.: A Statistical Saturation Attack against the Block Cipher 
PRESENT. http://citeseerx.ist.psu.edu 
6. Aura, T.: Cryptoanalysis of Lightweight Block Ciphers, November 2011. http://into.aalto.?f 
7. Grain: A Stream Cipher for Constrained Environments (n.d.). https://cr.yp.to 
8. Block and Stream Cipher Based Cryptographic Algorithms: A Survey (n.d.). 
www.ripublication.com 
9. Simon and Speck: Block Ciphers for the Internet of Things, July 2015. https://csrc.nist.gov 
10. Single-Cycle Implementations of Block Ciphers (n.d.). https://csrc.nist.gov 
11. Han, B., Lee, H., Jeong, H., Won, Y.: The HIGHT Encryption Algorithm draft-kisa-hight-00”, 
November 2011. https://tools.ietf.org 
12. Impossible Di?erential Cryptanalysis of the Lightweight Block Ciphers TEA, XTEA and 
HIGHT (n.d.). https://eprint.iacr.org 
13. IP Core Design of Hight Lightweight Cipher and Its Implementation (n.d.). http://airccj.org 
14. Rekha, R., Babu, P.: On Some Security Issues in Pervasive Computing: Light Weight 
Cryptography”, February 2012. http://www.enggjournals.com 
204 H. M. Z. Al Shebli and B. D. Beheshti
A Framework for Ranking IoMT Solutions 
Based on Measuring Security and Privacy 
Faisal Alsubaei1,2(&) , Abdullah Abuhussein3 , and Sajjan Shiva1 
1 
University of Memphis, Memphis, TN 38152, USA 
{flsubaei,sshiva}@memphis.edu 
2 
University of Jeddah, Jeddah, Saudi Arabia 
3 
St. Cloud State University, St. Cloud, MN 56301, USA 
aabuhussein@stcloudstate.edu 
Abstract. Internet of Medical Things (IoMT) is now growing rapidly, with 
Internet-enabled devices helping people to track and monitor their health, early 
diagnosis of their health issues, treat their illness, and administer therapy. 
Because of its increasing demand and its accessibility to high Internet speed, 
IoMT has opened doors for security vulnerabilities to healthcare systems. The 
lack of security awareness among IoMT users can provoke serious and perhaps 
fatal security issues. The disastrous consequences of these issues will not only 
disrupt medical services (e.g., ransomware) causing ?nancial losses but will also 
put the patients’ lives at risk. This paper proposes a framework to compare and 
rank IoMT solutions based on their protection and defense capability using the 
Analytic Hierarchy Process. The proposed framework measures the security, 
including privacy, in the compared IoMT solutions against a set of user 
requirements and using a detailed set of assessment criteria. This works aims to 
help in determining and avoiding risks associated with insecure IoMT solutions 
and reduce the gap between solution providers and consumers by increasing the 
security awareness and transparency. 
Keywords: IoMTQuantitative evaluationSecurityAssessment 
MetricsMeasurementsPrivacy 
1 Introduction 
The Internet of Medical Things (IoMT), also known as the healthcare Internet of 
Things (IoT), can be described as a collection of medical devices and applications that 
are connected through heterogeneous networks. IoMT solutions are being utilized by 
many healthcare providers to facilitate the management of diseases and drugs, improve 
treatment methods and the patient experience, and reduce cost and errors. Currently, 
about a third of IoT devices are found in healthcare; this number is expected to increase 
by 2025, with healthcare accounting for the largest percentage (approximately 40%) of 
the total global worth of IoT technology ($6.2 trillion) [1]. Further, approximately 60% 
of healthcare organizations have already adopted IoT technologies, and that percentage 
is expected to rise to approximately 87% by 2019 [2]. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 205–224, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_17
One of the most prevalent problems currently facing IoMT solutions is security 
fragility [3]. A survey found that of more than 370 organizations using the IoMT, 
approximately 35% suffered at least one cybersecurity incident in 2016 [4]. The lack of 
security awareness among IoMT users is a key factor for the security issues in IoMT. 
According to a recent survey, only 17% of connected medical device makers and 15% 
of medical professionals are aware of potential security issues and take serious mea-sures 
to prevent them [5]. This could explain why more than 36,000 healthcare-related 
devices in the U.S. alone are easily discoverable on Shodan, a search engine for IoT 
devices [6]. 
In addition, while there is a lack of security standards for the IoT in general, extra 
efforts are needed to regulate and ensure security in the IoMT. Unlike other domains, 
security in the medical ?eld is vital due to the sensitivity of the medical data and critical 
nature of the operations involved. The U.S. Food and Drug Administration (FDA) has 
taken steps to secure medical devices; however, only 10% of these devices are clas-si?ed 
under FDA Class III, which includes devices designed to support or sustain life 
(e.g., pacemakers) [7]. However, reduced patient wellbeing is not the only consequence 
of IoMT attacks, as these attacks can also have negative effects on medical data 
privacy, brand reputation, business continuity, and ?nancial stability. 
Moreover, there is a lack of consensus among stakeholders in healthcare organi-zations 
regarding security requirements [8]. This dissension and the lack of security 
awareness leaves adopters unsure about which security features are relevant to their 
solutions [9]. IoMT adopters usually are compelled to accept the default security in 
solutions. Adopters should instead be able to measure and verify security themselves to 
make well-educated decisions. It is also important to enable adopters to select security 
features based on their requirements (i.e., priorities) because security goals depend not 
only on the scenario but also on the assets and tolerance to risks. 
Due to the rapid evolution of IoMT technologies, there is a need to introduce a 
structured quantitative model that is expandable and offers opportunities to improve 
security. Thus, we propose a framework to assess the security and privacy levels 
provided in IoMT solutions using the Analytic Hierarchy Process (AHP). The proposed 
framework allows users to make knowledgeable choices when obtaining new or 
enhancing existing IoMT solution. It also allows adopters to de?ne their security 
priorities that reflect their security objectives and utilize them to rank prospective 
solutions in terms of security. The AHP-based method uses a list of detailed security 
assessment criteria collected by examining security controls published by specialized 
organizations such as the Open Web Application Security Project (OWASP), the 
International Organizations of Standardization (ISO), FDA among others. In addition, 
our method uses previous IoMT attacks and available IoMT solutions. 
The rest of this paper is organized as follows: the literature for measuring the 
security in IoMT is discussed briefly in Sect. 2. Section 3 presents the assessment 
criteria used in the framework. The security assessment method is demonstrated in 
Sect. 4. Section 5 presents a case study of the framework by assessing the degree of 
security in real IoMT solutions. Sections 6 and 7 discuss the evaluation and limitations, 
respectively. Lastly, in Sect. 8, we draw concluding remarks and outline some future 
works. 
206 F. Alsubaei et al.
2 Related Work 
This section surveys previous work in assessing the security of IoMT solutions. The 
main gaps in the current literature can be summarized as follows: 
• The assessment criteria are speci?c to a set of IoMT scenarios (e.g., patient mon-itoring) 
[10, 11]. 
• The security recommendations are abstract and target only manufacturers who 
primarily focus on one part of the IoMT (e.g., devices) to the exclusion of others, 
such as mobile and back-end [12–15]. 
• Lack of an assessment model that helps adopters, according to their security pri-orities, 
to quantify and compare the security of potential IoMT solutions [16–22]. 
• The focus is only on assessing existing solution(s) by utilizing post-deployment 
parameters such as con?gurations and current users’ feedback, which requires 
technical knowledge that often most IoMT users lack [14, 23, 24]. 
Despite the fact that these works are viewed as a valuable contribution, they cannot 
be incorporated ef?ciently into an assessment method for the IoMT. They also do not 
provide a practical assessment method that considers the user security priorities. In this 
paper, we build upon and complement the past efforts by proposing a framework for 
quantifying security in IoMT solutions that is twofold: (1) a detailed list of security 
assessment criteria that includes over 200 assessment questions for IoMT security. 
These questions were gathered by examining the IoMT security considerations from 
different sources and IoMT solution providers. (2) an AHP-based security assessment 
method for IoMT solutions utilizing the assessment criteria. The proposed framework 
enables users to rank candidate IoMT solutions based on their security to help them in 
making educated decisions. The importance of our framework lies in its ability to aid 
adopters in selecting or improving current IoMT solutions considering their security 
priorities. 
3 Assessment Criteria 
Because of the rapid development in the IoT technologies and therefore the complexness 
of IoMT, it is imperative to design a simple-to-use and elaborate list of assessment 
criteria that considers any IoMT solution. Therefore, we utilize the goal-question-metric 
(GQM) approach while designing the assessment criteria [25]. GQM is a popular 
approach to measure assessment goals by identifying questions and developing metrics 
to answer the questions [26]. These metrics are then used to ensure that the goals are 
met. As in Fig. 1, GQM is utilized in our framework such that for every IoMT com-ponent, 
there is a list of yes/no questions and corresponding answers (i.e., metrics). 
A small sample of the assessment criteria is shown in Table 1 and organized as. 
A Framework for Ranking IoMT Solutions 207
Fig. 1. Typical IoMT components. 
Table 1. A sample of the assessment criteria. 
Component Security feature Question 
Goal Sub-goal 
Secure 
Endpoint 
(E) 
1. Intrusion 
prevention 
1. Can the IoMT ecosystem detect endpoints that are 
connecting to abnormal service, or connecting to 
service at unusual times? 
2. Can IoMT ecosystem detect endpoints leaving or 
joining a communication network at erratic intervals? 
3. Can endpoint devices detect a signi?cantly abnormal 
network traf?c ?ngerprint of other devices? 
4. Do endpoint devices have secure event logging? 
2. Strong 
authentication 
1. Do endpoint devices require users to authenticate 
themselves before using/access any function? 
2. Do the endpoint devices provide mechanisms to 
prevent brute force attacks? 
3. Do endpoint devices use cryptographic certi?cates 
for self-authentication or to verify the broker identity of 
a user? 
4. Does the IoMT ecosystem ensure that no hardcoding 
or default passwords are allowed in endpoint devices? 
3. Secure updates 1. Does the IoMT ecosystem provide automated alerts, 
via SMS or email, for available manual updates for 
endpoint devices? 
2. Are endpoint devices updates and patches, including 
extensions or plugins, veri?ed (e.g., binary signing and 
hash values) after download and before installation to 
ensure their legitimacy? 
3. Does the IoMT ecosystem clearly identify the 
endpoints software running version? 
(continued) 
208 F. Alsubaei et al.
Table 1. (continued) 
Component Security feature Question 
Goal Sub-goal 
4. Protected 
memory 
Is the use of direct memory access in endpoint devices 
by other peripherals carefully managed and controlled? 
5. Secure 
communications 
Do endpoint devices renegotiate and verify 
communication security keys each time it reconnects to 
the communication network? 
6. Secure 
administration 
Do management systems distinguish between active 
and inactive endpoint devices? 
7. Secure 
hardware 
Do endpoint devices use epoxy covering for core circuit 
components? 
8. Secure software Are all debugging and test technologies disabled in the 
endpoint devices? 
9. Secure web 
interface 
Is the web interface of endpoint devices presented over 
hyper-text transfer protocol secure (HTTPS)? 
10. Secure storage Are all data stored in the endpoints’ removable media, 
protected cryptographically? 
11. Regulatory 
compliance 
Are the medical endpoint devices approved by the 
FDA? 
12. Secure root of 
trust 
Are the roots of trust certi?ed by FIPS or CC? 
Secure 
Gateway 
(G) 
1. Secure 
communications 
Does the gateway provide standard bidirectional end-to-end 
encryption? 
2. Secure storage Does the gateway cryptographically store data collected 
from endpoint devices? 
3. Intrusion 
prevention 
Does the gateway have robust security logging of all 
events? 
4. Secure 
hardware 
Does the gateway provide countermeasures against 
physical attacks? 
5. Strong 
authentication 
Does the gateway cryptographically authenticate 
endpoint devices to different components and vice 
versa? 
6. Secure updates Does the gateway allow for modular updates and 
monitoring of extensions and plugins? 
7. Secure web 
interface 
Is the gateway’s web interface presented over HTTPS? 
Secure 
Mobile (M) 
1. Secure 
communications 
1. Are the communications in mobile devices always 
encrypted? 
2. Intrusion 
prevention 
1. Does the mobile provide alerts for mobile status (e.g., 
connectivity or power outages)? 
3. Strong 
authentication 
Do mobile applications, or devices support biometrics 
authentication (e.g., ?ngerprint, face recognition)? 
4. Secure updates Are mobile vendor-speci?c security updates checked 
and installed automatically? 
(continued) 
A Framework for Ranking IoMT Solutions 209
Table 1. (continued) 
Component Security feature Question 
Goal Sub-goal 
5. Secure software Is the application certi?ed and listed in vendors’ 
application stores (e.g., Apple App Store, Google 
Play)? 
6. Secure web 
interface 
Is the mobile’s web interface presented in HTTPS? 
7. Secure storage Does the mobile application share any data with third 
parties? 
Secure 
Back-end 
(B) 
1. Secure cloud 
environment 
1. Does the cloud services always available even during 
scaling up/down? 
2. Does the cloud service provider hide information 
about the servers physical locations? 
3. Does the cloud have countermeasures against data 
leakage in multi-user storage services? 
4. Does the cloud service provider have an of?cial 
insider threat program? 
2. Secure software 1. Does the back-end utilize an API for the application 
to cryptographically identify itself to its peers? 
2. Are back-end third-party libraries actively monitored, 
managed, and audited? 
3. Are the back-end applications designed to mitigate 
buffer errors using the operating system’s mechanisms? 
3. Secure web 
interface 
Does the back-end web interface use certi?cates that are 
signed by a certi?cate authority? 
4. Regulatory 
compliance 
Does the back-end use standard protocols and 
technologies? 
5. Risk 
assessment 
Did the IoMT solution provider identify the assets, risk 
factors, and threat agents? 
6. Privacy 
assurance 
Does the IoMT solution provider have a process to 
ensure that the privacy of individuals’ personal and 
medical information complies with the latest relevant 
privacy laws (e.g., Health Insurance Portability and 
Accountability Act (HIPAA), Health Information 
Technology for Economic and Clinical Health Act 
(HITECH) or the General Data Protection Regulations 
(GDPR), Personally Controlled Electronic Health 
Records Act, etc.) in effect over user control of their 
data? 
7. Secure 
development 
lifecycle 
Does the IoMT solution provider validate management 
of the supply chain, the software, the sources of the 
equipment, and the purchaser and supplier aspects of 
the infrastructure? 
(continued) 
210 F. Alsubaei et al.
3.1 Goals 
The goals are the IoMT components to be secured (?rst column in Table 1). The IoMT 
typical components we use, as outlined in Fig. 1, are de?ned as follows [27]: 
Endpoints: These are connected medical devices that typically have embedded sen-sors 
to collect data and forward it to the back-end servers. Based on their operating 
system, hardware, communication media, mobility, etc., these devices can be of various 
kinds but collaborate heterogeneously to perform a common task. Endpoint devices can 
be wearable sensors (e.g., blood pressure monitors, heart monitors, pulse oximeters), 
implantable devices (e.g., embedded cardiac function monitoring systems, swallowable 
camera capsules), ambient sensors (e.g., motion sensors, pressure sensors, room tem-perature 
sensors), or stationary devices (e.g., computerized tomography scanners, 
surgical instruments). 
Gateway: These are optional devices to support some weak endpoint devices. Some 
strong endpoint devices can have gateway capabilities and can serve as gateways; in 
this case, these devices are called border routers. Gateways act as a bridge network to 
aggregate the data collected from the endpoint devices and transmit it to the back-end. 
Because of its location, it also serves as a secure channel between the insecure, but 
trusted, local network and the untrusted public Internet. 
Table 1. (continued) 
Component Security feature Question 
Goal Sub-goal 
8. Incident 
response 
Does the IoMT solution provider have an incident 
response procedure in place for information recovery? 
9. Secure storage Are the back-end authentication credentials (i.e., 
usernames, passwords, device ids, etc.) salted and 
hashed before stored? 
10. Secure 
communications 
Does the back-end have quality of service mechanisms 
for delivery of targeted messages to speci?c 
components? 
11. Secure 
updates 
Does the back-end report and update service 
infrastructure’s third-party components (both software 
and hardware) regularly to ensure the latest security 
updates are installed once available? 
12. Strong 
authentication 
Does the authentication service gather metrics to 
determine whether the user changed to an alternative 
computing platform, but still uses the former token? 
13. Secure 
administration 
Does the back-end include load-balancing features and 
redundancy systems? 
14. Intrusion 
prevention 
Do the back-end protect against malware-based attacks? 
A Framework for Ranking IoMT Solutions 211
Back-end: Most current IoMT environments have back-end server(s) that are often 
hosted on the cloud for better scalability. IoT platforms are often utilized for provi-sioning, 
management, and automation of endpoint devices. They also provide other 
common server-side tasks such as centralized data storage, backups, reports, and 
analytics, etc. 
Mobile: IoMT systems can also have mobile applications to control endpoint devices 
and provide limited back-end capabilities and instant alerts. 
Every goal (i.e., IoMT component in the ?rst column of Table 1) has sub-goals (i.e., 
security features in the second column) to ensure that the security goals are achieved. 
For instance, to secure the endpoint devices, the identi?ed sub-goals are as follows: 
secure administration, strong authentication, secure updates, intrusion prevention, 
protected memory, secure communications, secure web interface, secure hardware, 
secure software, secure storage, regulatory compliance, and secure root of trust. 
3.2 Questions 
The assessment questions (third column of Table 1) were thoroughly examined and 
collected from various reliable resources that include: 
• Medical-speci?c sources, such as guidelines from the FDA [28], ISO [15], the 
Medical Device Risk Assessment Platform (MDRAP) assessment questionnaire 
[13], and the Naval Medical Logistics Command (NMLC) [29], among others. 
• General IoT security considerations provided by OWASP [17], the Cloud Security 
Alliance (CSA) [16], the Global System for Mobile Communication Association 
(GSMA) [19], and others [18]. 
• The documentation of popular IoMT solutions and their accompanying Security 
Level Agreements (SecLAs). 
The yes/no questions are less demanding and provide the answers to the respon-dent. 
Thus, they are quick and easy to answer and provide an accurate and consistent 
assessment. These questions precisely measure the different levels of security in the 
security features. For example, the security of encryption depends on the used algo-rithm 
and encryption key size. Hence, our questions consider, and quantify, such levels 
of security. Due to the space constraint, in Table 1 we listed only a sample list of the 
assessment criteria in which, only one question is included per security feature except 
for the questions used in the case study. The full list will be available in future 
publications. 
3.3 Metrics 
A single metric is a score that depends on the question answer. Our proposed frame-work 
utilizes the documentation presented by solution providers to determine the 
metrics. Metrics measure the degree of achieving a sub-goal and, ultimately, a goal. 
The overall degree of security provided by a security feature is the total scores for all 
the assessment questions under that feature. The security features are then used to 
calculate the degree of security of a component. As illustrated in Fig. 3, this forms a 
hierarchy for the assessment. 
212 F. Alsubaei et al.
Fig. 2. The proposed framework flow. 
Fig. 3. Sample pro?le represented in hierarchy. 
A Framework for Ranking IoMT Solutions 213
4 IoMT Security Assessment 
In this section, we present an assessment method that employes the presented hierarchal 
list of assessment criteria and perfectly suits its hierarchal structure. IoMT security 
depends on multiple factors; therefore Multiple criteria decision-making (MCDM) is 
required such that all goals (and sub-goals) are assessed, and their scores are aggregated 
in a meaningful score. Hence, we use the AHP in our assessment method to achieve 
this task. AHP is a popular technique to solve MCDM problems [30]. What makes the 
AHP more suitable in this scenario than other MCDM techniques, is its flexibility as 
well as its ability to address inconsistencies across requirements. It also allows for 
composite quantitative and qualitative weighted questions to be compared easily 
because of its pairwise comparisons of decision criteria [31]. The pairwise results of 
comparisons and weights for every criterion are structured into a hierarchy. These 
comparisons of the questions and weights are the basis for the security assessment of 
IoMT solutions. As shown in Fig. 2, there are three main stages in our assessment 
method, which are described as follows. 
4.1 De?ning Security Pro?les 
In this stage of the framework, security pro?les are de?ned to prepare them for 
comparisons in the next stage. In other words, pro?led IoMT solutions are described in 
terms of their security capabilities producing IoMT solution pro?le. The user desired 
degree of security is also captured. Thus, the output of this stage is a user pro?le that 
includes the user requirements (i.e., security priorities) and at least two IoMT solution 
security pro?les. This allows the user to (1) verify that the solution’s security matches 
their requirements, and (2) compare the security in two or more solutions. The two 
types of security pro?les are described as follows. 
User Requirements Pro?le. This is where IoMT users specify their desired security 
degree. The user assigns weights for all elements in the second, third, and/or fourth 
levels as in Fig. 3. This detailed pro?ling is crucial for better accuracy when comparing 
the relative importance of two (or more) elements within the same level. This ensures 
that all the user’s security priorities are met. The framework provides flexibility in 
assigning weights. It allows users to assign weights on a scale of 1 to 10 (i.e., a weight 
of 10 denotes it is extremely more important than others, whereas 1 denotes equal 
importance) or binary (i.e., 1 or yes denotes required, and 0 or no denotes not required), 
or mixed at various layers of the hierarchy. For example, a user marks one component 
as very important, assigns quantitative weights to security features in another com-ponent, 
and assigns Boolean (yes/no) to a third component. The weight of 0 can be 
assigned to irrelevant question(s) so that they are disregarded from the assessment. 
Solution Pro?les. To compare the degree of security in IoMT solutions, assessment 
criteria questions (described in Sect. 3) should be answered for each IoMT solution 
individually to assess the security in IoMT solutions. One can use the publicly available 
speci?cations of IoMT solution from the product FAQs page, or contact the solution 
providers’ customer service, to answer the assessment criteria questions. In open-source 
solutions, security experts can be involved in answering these questions. 
214 F. Alsubaei et al.
4.2 Security Quanti?cation 
In this stage, the security pro?les generated in the previous stage (i.e., the user 
requirements pro?le and the solutions’ pro?les) are used to assess security in the 
solutions and to check if they match the user security requirements. The terms that will 
be used in the assessment is shown in Table 2. 
Since the questions in our assessment criteria require only yes or no answers, these 
values can be represented as 1 and 0 values. The relationships across the solutions (S) 
of the question value (V) can be de?ned as a ratio: 
S1 
S2 
¼ 
V1 
V2 
¼ 
0 if ðV1 
¼ 
0 
^ 
V2 
¼ 0Þ _ 
ðV1 
¼ 
0 
^ 
V2 
¼ 1Þ 
1 if ðV1 
¼ 
1 
^ 
V2 
¼ 1Þ _ 
ðV1 
¼ 
1 
^ 
V2 
¼ 0Þ 
( 
ð1Þ 
For example, assume two IoMT solutions, S1 and S2, have values 
V1;q 
¼ 
0and V2;q 
¼ 
1, respectively, for question q, which user U requires 
(thus; Vu;q 
¼ 
1). The pairwise comparison ratio of S1 and U is de?ned as V1;q=Vu;q = 0, 
which means that S1 is not satisfying the user requirement. However, the pairwise 
comparison ratio V2;q=Vu;q = 1 means that S2 is ful?lling the user requirement. 
This stage relies on the pairwise comparison matrix (CM) of the security questions 
in solutions’ pro?les and the user requirements pro?le. Using a CM for a question over 
all pro?les, we obtain a one-to-one comparison where V1;q=V2;q denotes the relative 
rank of S1 over S2. If there are n IoMT solutions, the one-to-one CM (including the user 
requirment pro?le) will be of size ðn þ 1Þ Þ 
ðn þ 1Þ: 
ð2Þ 
Table 2. Description of assessment terms. 
Term Descriptions 
q Assessment question 
Si Solution i, where i 
2 
f1; ...; ng 
and n denotes the number of IoMT solutions to be 
compared 
Vi;q Metric (answer) of q provided by Si 
Si;q Si provides q with value Vi;q 
U IoMT user (adopter) 
Vu;q The user required value of q 
S1=S2 Relative rank ratio of S1 over S2, regarding q 
S2=S1 Relative rank ratio of S2 over S1, regarding q 
Si;q=U Relative rank ratio of Si over U, which indicates if Si ful?lls Vu;q 
A Framework for Ranking IoMT Solutions 215
4.3 Ranking 
The relative ranking of all the IoMT solutions for any question, which is known as the 
priority vector (PV), is derived by calculating the eigenvector of the CM. The PV 
transforms the CM into a meaningful vector that summarizes the results of all com-parisons 
(ratios) into a normalized numerical ranking. The principle of eigenvector in 
AHP is necessary to reduce human errors in the judgment process [32]. The following 
example PV shows that solutions 2 and 3 meet the user requirement U. 
After all PVs (i.e., rankings) for all questions are computed, they are aggregated 
(from bottom to top) to determine the overall security rankings of IoMT solutions. 
All the questions’ PVs are combined with their assigned relative weights from the 
previous stage. 
PVaggregated 
¼ 
X 
g 
j¼1 
wj:PVj 
ð3Þ 
where PVj denotes the PV of the CM of question j, wj denotes the relative weight 
assigned to the question, and g is the number of all questions. If the user wants to 
compare the security in the underlying levels, then the weights of the upper levels will 
not be considered in the aggregation. For example, if a user wants to compare only the 
security features in one component, then only the weights of the security features, and 
their corresponding questions will be considered. 
5 Case Study 
In this section, we demonstrate how the framework can be used to assess and rank three 
popular real-world cloud-based IoT platforms that are being used widely in healthcare. 
We examined the SLAs and other available documentation describing the offered 
security to answer the questions in our assessment criteria. As a result, we have three 
distinct security pro?les for these platforms. We consulted their customer service in 
order to answer the questions that we could not answer using their publicly published 
documentation. The questions for which we could not ?nd relevant information to 
answer are dealt with as answered “no” because if the answers were “yes”, then they 
would have used this to market the security of their products. To illustrate the flexibility 
of our framework, we show three examples of hypothetical weights (i.e., user 
requirements) for a sample of the assessment criteria as described in Table 3 (where yes 
and no are denoted by1 and 0, respectively). 
Case 1 
In this detailed case, the user assigned boolean weights by answering all relevant 
questions (i.e., yes denotes required, and no denotes not required). For every question, 
216 F. Alsubaei et al.
Eq. 1 is used to perform pairwise comparisons on its CM. Thus, the CM of B.2.3 is: 
Then, the PVB.2.3is calculated by ?nding the normalized eigenvector of the CMB.2.3 
This indicates that S1 does not satisfy the user requirement, whereas S2 and S3 ful?ll 
the user requirement. The same step is applied to all questions to have the relative 
ranking of the lowest level in the hierarchy. Then, to aggregate the PVs from bottom to 
top, the normalized weights of each level are considered to prepare all questions’ PVs 
Table 3. Case study assessment values. 
Goals Question Metrics User 
requirements 
(Weights) 
Component Security feature S1 S2 S3 U1 U2 U3 
Secure Endpoint (E) 1 1 1 1 1 0 2 9 
2 1 1 1 1 
3 1 1 1 0 
4 1 1 1 1 
2 1 1 1 1 0 
2 1 1 1 1 
3 1 1 1 1 
4 1 1 1 0 
3 1 1 1 1 0 
2 1 1 1 0 
3 1 1 1 1 
Secure Mobile (M) 1 1 1 1 0 1 0 4 5 
2 1 0 1 1 1 1 
Secure Back-end (B) 1 1 1 0 1 0 8 4 9 
2 1 1 1 1 
3 1 1 1 1 
4 1 1 1 1 
2 1 1 1 1 1 0 
2 1 1 1 0 
3 0 1 1 1 
A Framework for Ranking IoMT Solutions 217
for the ?nal aggregation. Thus, the ?nal ranking PVB.2.3 
¼
1 
3 
1 2 
1 3 
1 
1 
1 
PVB:2:3. 
Similarly, PVE.2.4 
¼
1 
3 
1 3 
1 4 
1 
0 
1 
PVE:2:4. Now that all PVs are weighted, the 
aggregation of all of them will reveal the ?nal overall rankings. 
Thus, in case 1, S2 ful?lls the user security requirements (Fig. 4c). The lower 
levels’ rankings can also be compared the same way. Figures 4a, b show the com-ponent 
level comparisons and security feature-level comparisons, respectively. 
Case 2 
In this case, the user assigned priority weights at various levels. For level 1, E is 
assigned a weight of 2, which denotes low importance. For level 2, B.1 is assigned a 
weight of 8, which denotes relatively high importance, whereas B.2 is not required and 
hence has an assigned weight of 0. Thus, the normalized weights for B.1 and B.2 are 1 
and 0, respectively. For instance, the weighted PVB.1.1 
¼ 
0:4 
: 
1 1 4 
1 
PVB:1:1. 
Finally, for the lowest level, M.1.2.1 is assigned a weight of 0 (not required), and M.2.1 
Fig. 4. Case study assessment results. 
218 F. Alsubaei et al.
is assigned 1. Applying the steps described in case 1 for all questions with the new 
weights, the ?nal rankings are: 
As Fig. 4c shows, unlike other cases, only S3 satis?es the user security require-ments. 
This is because, in this case, the endpoint security features are not important and 
were given a low weight. Since S3 fully satis?es the other components, it shows a better 
ranking. 
Case 3 
In this case, the user assigned weights only to level 1. The normalized weights are 
B = E = 0.39, M = 0.22. Thus, as shown in Fig. 4c, the ?nal ranking reveals that only 
S2 ful?lls the user requirements. 
6 Evaluation 
To evaluate the framework, we present two methods. First, to verify completeness of the 
list of assessment criteria, we tested its ability to identify and avoid known real-world 
security incidents. Since our list of assessment criteria is collected from publications by 
multiple specialized organizations, it should cover all security considerations related to 
the IoMT. We veri?ed that by gathering all reported IoMT-related vulnerabilities, as of 
April 2018, in NIST’s National Vulnerability Database (NVD)
1 
and CVE Details2 
during the last three years to ensure their recentness. The keywords used in this 
extensive search are IoT, IoMT, medical, health, medical device, and healthcare. Upon 
?ltering all found vulnerabilities to exclude the ones that are irrelevant to IoMT (e.g., 
non-medical endpoints), we found 40 distinct vulnerabilities. Then, we analyzed the 
details of each vulnerability and mapped it to corresponding security feature(s). This 
way, we veri?ed our framework’s accuracy in highlighting all missing or inadequate 
security features. Table 4 shows the results of our analysis for each vulnerability with 
Common Vulnerabilities and Exposures (CVE) ID and the most relevant feature(s) for 
each vulnerability in the affected IoMT component. It is very likely that every vulner-ability 
is covered by more than one security feature. As shown in Table 4, these vul-nerabilities 
have diverse characteristics in terms of the affected IoMT component, 
solution type, and scenario. Since our framework is successfully able to provide security 
considerations that safeguard from these varied vulnerabilities, we believe it can scale 
well to different and unknown vulnerabilities. This also demonstrates the framework’s 
extensibility and cross-domain applicability. 
To verify the effectiveness of the framework in capturing missing or inadequate 
security features, we analyzed two commercial IoMT solutions that are known to 
have/had serious security issues. For example, Medfusion 4000 syringe infusion 
1 
https://nvd.nist.gov. 
2 
https://www.cvedetails.com. 
A Framework for Ranking IoMT Solutions 219
pumps3 are stationary medical endpoints that are used to deliver small doses of 
medication in acute care settings. These pumps were vulnerable to eight and serious 
security issues (vulnerabilities 1–8 in Table 4). These vulnerabilities are discussed in 
details in an advisory issued by the U.S. Community Emergency Response Team 
(CERT) [33]. Using our framework to assess the security of this device (before 
applying patches) and compare it with other devices would show that the device has a 
low-security score especially regarding the authentication. This information should 
help future adopters in making better decisions. For instance, they can choose a better 
alternative or wait until the vulnerabilities are patched. This helps users or adopters to 
avoid the severe consequences associated with these unpatched endpoint devices, 
which were highlighted in the Common Vulnerability Scoring System (CVSS)
4 
as 
medium to high [33]. Similarly, kaa5 is IoT platform that allows healthcare system to 
establish cross-device connectivity and implement smart features into medical devices 
and related software systems. Kaa is vulnerable (no. 9) to code injection attacks. 
Comparing its security with other platforms will result in a low score in feature B.2. 
Table 4. IoMT vulnerabilities and their relevancy to our assessment framework. 
No. Vulnerability CVE ID Relevant feature (s) 
1 2017-12726 E.2 
2 2017-12725 E.2 
3 2017-12724 E.2 
4 2017-12720 E.2 
5 2017-12723 E.10 
6 2017-12722 E.4 
7 2017-12721 E.5 
8 2017-12718 E.8 
9 2017-7911 B.2 
10 2017-11498 B.2 
11 2017-11497 B.2 
12 2017-11496 B.2 
13 2017-6780 B.14 
14 2017-7730 G.3 
15 2017-7729 G.1 
16 2017-7728 G.5 
17 2017-7726 G.7 
18 2017-3215 M.3 
19 2017-3214 M.7 
20 2017-8403 M.3 
(continued) 
3 
https://www.smiths-medical.com. 
4 
https://nvd.nist.gov/vuln-metrics/cvss. 
5 
https://www.kaaproject.org/healthcare/. 
220 F. Alsubaei et al.
7 Limitations 
Solution providers cannot be forced to cooperate by making technical details of their 
products available to the public due to service abstraction constraints. This lack of 
technical details can be one limitation of this work as these details are required for the 
assessment. Nevertheless, adopters can always contact the solution providers’ customer 
service to inquire about missing information. This will also give the adopters the 
opportunity to know how cooperative and knowledgeable are the customer service 
teams in the candidate solutions. We do not anticipate that providers will voluntarily 
make their security features publicly available. Our work can motivate them to 
cooperate to meet customers’ needs and compete with others transparently. Moreover, 
the assessment criteria might not be easy to understand especially for novice users, 
such as patients and medical professionals, who often lack the technical knowledge. 
But, this work encourages them to learn about the security features and the potential 
issues. Also, some users might ?nd the process followed in this framework lengthy and 
complex. Nevertheless, we argue that it is worth the initial effort and time investment 
because it helps in discovering and avoiding severe consequences of improper security. 
Table 4. (continued) 
No. Vulnerability CVE ID Relevant feature (s) 
21 2017-5675 E.8 
22 2017-5674 E.8 
23 2017-14002 E.2 
24 2018-5457 B.11 
25 2016-8355 E.6, E.8 
26 2017-6018 E.9 
27 2017-5149 E.5 
28 2015-3958 E.1 
29 2015-3957 E.10 
30 2015-3955 E.7 
31 2015-1011 E.2 
32 2015-3459 E.5 
33 2017-14008 B.12 
34 2017-14004 B.12 
35 2017-14006 B.12 
36 2017-14101 B.13 
37 2018-5438 B.12 
38 2016-9353 B.9 
39 2016-8358 E.5 
40 2017-12713 B.12 
A Framework for Ranking IoMT Solutions 221
8 Conclusion and Future Work 
Security plays a vital role in IoMT success. In this paper, we presented a security 
assessment framework to increase the trust in IoMT solutions. This framework pro-vides 
a list of security assessment criteria for IoMT solutions, composed of detailed and 
simple-to-use questions. Using this assessment criteria, the framework also provides an 
assessment method for IoMT solutions. The signi?cance of this work lies in its ability 
to assess a wide range of (1) stakeholders’ requirements (e.g., patients, medical pro-fessionals, 
system administrators etc.); (2) solutions (services, devices, platforms, etc.); 
and (3) architectures (e.g., mobile-controlled, cloud-based, etc.). 
This work educates IoMT users (e.g., patients, medical professionals, etc.) who 
often have a low level of awareness about the IoMT security issues and how to address 
them. The bene?ts of this work are not only limited to adopters. This framework can 
also be bene?cial to IoMT solution providers in assessing their products and compare 
them to other IoMT solutions. This encourages healthier and transparent competition 
among solution providers. Moreover, researchers and legislators/standardization bodies 
can utilize it to understand the security issues in order to better design security solutions 
and regulations. 
Our future work includes updating the list of assessment criteria that was mentioned 
in this paper as well as in our previous work [34] to adapt to the continuous and rapid 
evolution of IoMT solutions and their technologies. We will also develop a web-based 
tool based on the framework presented in this paper. 
References 
1. A Guide to the Internet of Things Infographic. https://intel.com/content/www/us/en/internet-of-
things/infographics/guide-to-iot.html 
2. 87% of Healthcare Organizations Will Adopt Internet of Things Technology by 2019 (2017). 
https://www.hipaajournal.com/87pc-healthcare-organizations-adopt-internet-of-things-technology-
2019–8712/ 
3. Alsubaei, F., Abuhussein, A., Shiva, S.: Security and privacy in the internet of medical 
things: taxonomy and risk assessment. In: 2017 IEEE 42nd Conference on Local Computer 
Networks Workshops (LCN Workshops), pp. 112–120 (2017) 
4. Cyber Risk Services|Deloitte US|Enterprise Risk Services. https://www2.deloitte.com/us/en/ 
pages/risk/solutions/cyber-risk-services.html 
5. Inc, S.: Synopsys and Ponemon study highlights critical security de?ciencies in medical 
devices. https://www.prnewswire.com/news-releases/synopsys-and-ponemon-study-highlights-
critical-security-de?ciencies-in-medical-devices-300463669.html 
6. Medical Devices are the Next Security Nightmare. https://www.wired.com/2017/03/medical-devices-
next-security-nightmare/ 
7. Hamlyn-Harris, J.H.: Three Reasons Why Pacemakers are Vulnerable to Hacking. http:// 
theconversation.com/three-reasons-why-pacemakers-are-vulnerable-to-hacking-83362 
8. Jalali, M.S., Kaiser, J.P.: Cybersecurity in hospitals: a systematic, organizational perspective. 
J. Med. Internet Res. 28, 10059 (2018) 
222 F. Alsubaei et al.
9. MSV, J.: Security is Fast Becoming the Achilles Heel of Consumer Internet of Things. 
https://www.forbes.com/sites/janakirammsv/2016/11/05/security-the-fast-turning-to-be-the-achilles-
heel-of-consumer-internet-of-things/ 
10. Abie, H., Balasingham, I.: Risk-based adaptive security for smart IoT in eHealth. In: 
Proceedings of the 7th International Conference on Body Area Networks, pp. 269–275. 
ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications 
Engineering) (2012) 
11. Savola, R.M., Savolainen, P., Evesti, A., Abie, H., Sihvonen, M.: Risk-driven security 
metrics development for an e-health IoT application. In: Information Security for South 
Africa (ISSA), pp. 1–6. IEEE (2015) 
12. Food and Drug Administration: Postmarket Management of Cybersecurity in Medical 
Devices (2016). https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationand 
Guidance/GuidanceDocuments/UCM482022.pdf 
13. MDRAP|Home Page. https://mdrap.mdiss.org/ 
14. McMahon, E., Williams, R., El, M., Samtani, S., Patton, M., Chen, H.: Assessing medical 
device vulnerabilities on the Internet of Things. In: 2017 IEEE International Conference on 
Intelligence and Security Informatics (ISI), pp. 176–178. IEEE (2017) 
15. Medical Equipment in General. https://www.iso.org/ics/11.040.01/x/ 
16. New Security Guidance for Early Adopters of the IoT. https://cloudsecurityalliance.org/ 
download/new-security-guidance-for-early-adopters-of-the-iot/ 
17. OWASP Internet of Things Project-OWASP. https://owasp.org/index.php/OWASP_ 
Internet_of_Things_Project#tab = Medical_Devices 
18. [Press Release WP29] Opinion on the Internet of Things|CNIL. https://www.cnil.fr/en/press-release-
wp29-opinion-internet-things 
19. GSMA IoT Security Guidelines-Complete Document Set. https://www.gsma.com/iot/gsma-iot-
security-guidelines-complete-document-set/ 
20. Laplante, P.A., Kassab, M., Laplante, N.L., Voas, J.M.: Building caring healthcare systems 
in the internet of things. IEEE Syst. J. 12, 1–8 (2017) 
21. Islam, S.M.R., Kwak, D., Kabir, M.H., Hossain, M., Kwak, K.S.: The internet of things for 
health care: a comprehensive survey. IEEE Access. 3, 678–708 (2015) 
22. Williams, P.A., Woodward, A.J.: Cybersecurity vulnerabilities in medical devices: a 
complex environment and multifaceted problem. Med. Devices Auckl. NZ. 8, 305–316 
(2015) 
23. Leister, W., Hamdi, M., Abie, H., Poslad, S.: An evaluation framework for adaptive security 
for the iot in ehealth. Int. J. Adv. Secur. 7(3&4), 93–109 (2014) 
24. Wu, T., Zhao, G.: A novel risk assessment model for privacy security in Internet of Things. 
Wuhan Univ. J. Nat. Sci. 19, 398–404 (2014) 
25. Caldiera, V., Rombach, H.D.: The goal question metric approach. Encycl. Softw. Eng. 2, 
528–532 (1994) 
26. Bayuk, J., Mostashari, A.: Measuring systems security. Syst. Eng. 16, 1–14 (2013) 
27. OWASP Internet of Things Project-OWASP. https://www.owasp.org/index.php/OWASP_ 
Internet_of_Things_Project 
28. Health, C. for D. and R.: Digital Health-Cybersecurity. https://www.fda.gov/ 
MedicalDevices/DigitalHealth/ucm373213.htm 
29. Naval Medical Logistics Command (NMLC): Medical Device Risk Assessment Question-naire 
Version 3.0. (2016). http://www.med.navy.mil/sites/nmlc/Public_Docs/Solicitations/ 
RFP/MDRA%203.0-20160815RX.PDF 
30. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1, 83–98 
(2008) 
A Framework for Ranking IoMT Solutions 223
31. Cheng, Y., Deng, J., Li, J., DeLoach, S.A., Singhal, A., Ou, X.: Metrics of Security. In: Kott, 
A., Wang, C., Erbacher, R.F. (eds.) Cyber Defense and Situational Awareness, pp. 263–295. 
Springer International Publishing, Cham (2014) 
32. Saaty, T.L.: Decision-making with the AHP: why is the principal eigenvector necessary. Eur. 
J. Oper. Res. 145, 85–91 (2003) 
33. Smiths Medical Medfusion 4000 Wireless Syringe Infusion Pump Vulnerabilities (Update 
A)|ICS-CERT. https://ics-cert.us-cert.gov/advisories/ICSMA-17-250-02A 
34. Alsubaei, F., Abuhussein, A., Shiva, S.: Quantifying security and privacy in Internet of 
Things solutions. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management 
Symposium, pp. 1–6 (2018) 
224 F. Alsubaei et al.
CUSTODY: An IoT Based Patient Surveillance 
Device 
Md. Sadad Mahamud(?) , Md. Manirul Islam, Md. Saniat Rahman, 
and Samiul Haque Suman 
American International University-Bangladesh, Dhaka, Bangladesh 
{sadad,manirul,saniat,samiul}@aiub.edu 
Abstract. In this paper, the authors present an assistance device for patient’s 
surveillance. An IoT based system is developed for monitoring patient’s heart 
rate, body temperature and saline rate. An Arduino microcontroller is used here 
for processing the data and ESP32 module is used for monitoring the patient’s 
data through internet and a GSM module is used for notifying the doctors in 
emergency case. The main objective of this project is to help the doctors and 
nurses to monitor a patient’s health condition through internet and over cellular 
network. On the other hand, if the monitoring parameters exceed beyond their 
nominal values, the ready message is sent to the concerned duty doctor as well 
as the attendant and display it in the LCD screen and a speci?c audio sound is 
played for urgent awareness. 
Keywords: IoT · ESP32 module · Arduino · Heart rate · Body temperature 
Saline measurement · GSM · Micro SD card module · Audio · LCD display 
1 Introduction 
The arrival of modern technology has made our lives much easier and comfortable in 
comparison with the previous decades. But after having this technology, still a lot of 
medical patients die each year due to the lack of integration of the technologies and 
make it accessible at a very a?ordable cost. It is so di?cult for a doctor to monitor a 
patient 24/7 incessantly who is su?ering from critical disease or some corporal malady. 
One of the CCN health reports showed the 10 shocking medical mistakes for the patient’s 
date case [1] and most of them occurred for lack of timely care. Hence, to remove human 
trouble and lessen the compulsion of monitoring a patient restlessly from a doctor and 
a nurse, this paper proposes a low-cost surveillance system called CUSTODY for moni- 
toring a patient through internet with the conduct of GSM technology. Health monitoring 
system measures patient’s health condition in regular interval of time. This paper 
describes the design of an IoT based pulse rate, saline level rate and body temperature 
measuring system with the help of Arduino microcontroller and ESP32 module. The 
system raises an alarm when the pulse rate or body temperature rate or the saline level 
goes beyond or falls behind the threshold value and sends an emergency alert noti?cation 
to concerned doctor and family member. Patient’s real-time monitoring parameters can 
also be viewed via internet at any time. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 225–234, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_18
2 Related Works 
A Heart rate monitoring system is designed by P.A. Pawar [2] using IR based sensor 
which can measure the heart rate and send the signal through GSM module. This system 
is also based on the Arduino Microcontroller. Author mainly designed this system for 
home-based e?ective heart rate monitoring system. A LPC2129 health monitoring 
system is designed by M. Pereira [3]. In this paper the authors presented an IoT based 
device using ARM 7 processor, ECG, Heart Rate, AD8232, and Body Fat percentage 
module. The main idea presented in this paper is to provide better and e?cient health 
services to the patients by implementing a networked information cloud, so the experts 
and doctors could make use of this data and provide a quick and an e?cient solution. 
An IoT based health monitoring system is proposed by N. Gupta with his co-authors 
using android app only for health monitoring [4]. This paper presents a health monitoring 
system using Pulse Oximeter sensor, Temperature sensors, PIR motion sensor. Using 
GPRS network the patient’s data is uploaded in the custom server. As per the study of 
this paper a health monitoring system is an e?cient system to monitor the health condi- 
tion to keep track of one’s health. C. Raj with his co-authors proposed An IoT based e-
Health Care System for Remote Telemedicine [5]. For testing their system, they used 
Body Temperature, Pulse Oximeter, ECG, GSR, EMG sensors to measure the patient’s 
body parameters. The paper mainly focused on building a common interface between 
multiple remote center and medical practitioner to monitor. An IoT Based Smart Health 
Care System using CNT Electrodes is designed by M. Bansal with his co-authors [6]. 
The main objective of this paper is to provide people with an e?ective solution to live 
in their homes or workplace comfortably instead of going to expensive hospitals. S. 
Lavanya with his co-authors develops a remote prescription and I-home healthcare based 
on IoT [7]. The authors used Heart rate sensor, Real time clock, RFID tag and for network 
connectivity they used Raspberry Pi server. In general, this paper presents an IoT-based 
intelligent home-centric healthcare platform which seamlessly connects smart sensors 
attached to the human body for physiological monitoring for daily medication manage- 
ment. And many more researches are going on in this vast research ?eld. 
3 Architecture Model of the System 
Our system is based on Arduino Mega Microcontroller Unit board and ESP32 WiFi 
Module board. All the sensors data are being fetched and decoded into the microcon- 
troller and then being sent in real time using the ESP32 module. Figure 1 describes the 
architecture model of the proposed system. 
226 Md. S. Mahamud et al.
Fig. 1. Architecture model of the system. 
4 Design System 
The total system is primarily based on Microcontroller Arduino Mega. Here Arduino 
serves as the main controlling unit. After receiving the data from the temperature sensor, 
saline load sensor and pulse sensor, the microcontroller unit decodes data for ?nal oper- 
ation. The ESP32 WiFi module is used here for the communication with the public 
network. All the data that is being received by Arduino is stored into the micro SD card 
module and that stored data is available into the web server through ESP32 module. 
Figure 2 shows the simulation model of the total system. The total simulation is done 
with the Fritzing simulation software [8]. 
Fig. 2. Simulation circuit diagram. 
CUSTODY: An IoT Based Patient Surveillance Device 227
If any of the sensor values cross prede?ned nominal value, then a pre-de?ned SMS 
or call will be given to the doctor and a speci?c audio sound will be played. The LCD 
display panel shows the current state of the patient. 
4.1 Arduino Mega 2560 
The Arduino Mega 2560 is a microcontroller board based on the ATmega2560. The 
MEGA 2560 is designed for more complex projects. With 54 digital I/O pins, 16 analog 
inputs [9]. Mega is the main controlling unit for this system. 
4.2 Pulse Sensor 
Heart Beat can be measured based on optical power variation as light is scattered or 
absorbed during its path through the blood as the heart beat changes [10]. In this system 
we have used hair clip pulse sensor. We consider value_1 and value_2 are the ?rst pulse 
and list pulse counter value. Now, Ten_Pulse_time = value_1-value_2. So, 
Single_pulse_time = Ten_Pulse_time/10. Then our ?nal equation for Beats per Minute 
(BPM) is: 
Heart rate (BPM) = 
60/Single_pulse_time 
(1) 
After calculating this pulse rate using (1) Arduino store the current rate into the internet 
server through ESP32 module and if the pulse rate crosses the nominal value it sends a 
SMS, an audio output played, and the display shows the current condition. Figures 3 
and 4 shows the change in pulse rate measured for a 23 year old boy as our test patient 
in Arduino Serial Monitor. 
Fig. 3. Normal pulse rate. Fig. 4. Increased pulse rate. 
4.3 Temperature Sensor 
This system has used the water-proof DS18B20 Temperature sensor. The DS18B20 
provides 9 to 12-bit (con?gurable) temperature readings over a one-wire [12] interface 
[11]. Here, Temp = Output voltage * 0.48828125 and then ?nally 
Temp_?nal = (Temp *n 1.8) + 
32 (2) 
228 Md. S. Mahamud et al.
By using (2) [13] Arduino calculates the patient’s body temperature and executes its 
operation. 
4.4 Load Sensor Module 
For this system we have used a strain gauge load cell module [14]. The main concept 
behind the load sensor is to measure the saline weight because we cannot put any sensor 
inside the saline packet. The load sensor calculated the saline weight in litter and divided 
it into three levels. Level 3 indicate that the saline is full level 2 indicate that the saline 
is half and level 1 indicate that the saline is almost ?nish. The nurse or doctor should 
change the saline packet. The load sensor values are being fetched by the Arduino Mega 
microcontroller to check against the set values and trigger alarm if necessary. 
4.5 ESP32 WIFI Module 
ESP32 is already integrated with antenna and power ampli?er, low-noise ampli?ers, 
?lters, and power management module. The entire solution takes up the least amount of 
printed circuit board area. This board is used with 2.4 GHz dual-mode Wi-Fi and Blue- 
tooth chips by TSMC 40 nm low power technology [15]. In this system ESP32 is used 
for connecting the system with the cloud. The ESP32 module read the sensor data which 
is saved in the SD card and process into the cloud. A private cloud domain server is 
created for testing this system as “custody.com”. The monitoring web portal is created 
with php and all the data is stored on a MySQL server. After login into the web portal, 
using patient ID the user can see the patient’s real time condition. The ESP32 module 
operates in the network layer of the OSI model [16]. Figure 5 shows the current health 
condition of a patient in the CUSTODY web portal. 
Fig. 5. Private cloud domain server custody.com 
CUSTODY: An IoT Based Patient Surveillance Device 229
4.6 SIM900A GSM Module 
GSM is mainly used in devices like mobile phones as well as for long distance commu- 
nication. It transmits and receives data over GPRS, making video calls and SMS [17]. 
In this project SIM900A GSM module is used for sending SMS. When the sensor values 
will exceed the range for the given coordinates, GSM will send SMS to some selected 
numbers. Figures 10 and 12 show the SMS received by the cell phone which consists 
of patient’s condition. 
4.7 Micro SD Card Module 
The micro SD card module transfers data from a SD. The Arduino relates to the SD card 
through the breakout board and audio commands were saved in this SD card. The 
connection of the module with Arduino is shown in Fig. 2. When any sensor value 
crosses the range for the given coordinates an audio output will be generated to make 
the people aware about the danger. And it also stores the sensor data in the SD card for 
transfer it to internet with the help of Arduino and ESP32 Module. 
4.8 Audio Ampli?er and Speaker 
When the audio commands are played form the micro SD card the audio volume is 
relatively low. So, to make it louder, we used our own custom made 9 V hearable audio 
ampli?er. Audio ampli?er was made using LA4440 IC and an 8 ?n speaker is used. 
4.9 16 * 2 LCD Display 
A 16 * 2 LCD display is connected with the system. This display shows the current 
sensor value and the patient’s current condition. 
5 Hardware Model 
Figure 6 shows the hardware model of the system. All the sensors are connected with 
the Arduino and the output results are displayed into the LCD module along with Emer- 
gency audio output. 
230 Md. S. Mahamud et al.
Fig. 6. Hardware model of the system. 
6 Results 
Table 1 shows the test result of this system where the audio output and SMS output are 
set. We test this system for only one patient. Di?erent analysis results for this system 
are given below. Figures 7 and 8 show the test result when patient is in normal condition 
and the saline level is in normal condition as well. No SMS will be triggered or no audio 
will be played. The web portal will have patient’s current data. Figures 9 and 10 show 
the test output when the saline is in low-level condition. For testing purpose, we used a 
250 ml bottle as a saline packet. When saline is almost ?nished the load sensor gets a 
very small weight and a SMS will be sent and audio will be played. Figures 11 and 12 
show the test output of the situation when the temperature increases. We increased the 
temperature manually and a SMS is sent to the pre-de?ned number and an audio is played 
as well. 
Table 1. Results analysis for audio and SMS output 
Condition Audio SMS 
Normal condition No audio No SMS 
Normal condition No audio No SMS 
Normal condition No audio No SMS 
Body temperature increased Audio played SMS sent 
Pulse rate Increased Audio played SMS sent 
Saline level low Audio played SMS sent 
Figure 13 shows the web server monitoring window when the patient’s pulse rate is 
increased. And Fig. 14 shows that if anyhow two or three parameter falls in one time 
the system will return patient’s condition as emergency. On this condition a call will be 
sent to the attending doctor and also an emergency audio will be played by the system. 
CUSTODY: An IoT Based Patient Surveillance Device 231
Fig. 7. Normal condition. Fig. 8. Normal saline condition test. 
Fig. 9. Saline level low. Fig. 10. SMS received for saline low. 
Fig. 11. Temperature increased. Fig. 12. SMS sent for temp. increased. 
232 Md. S. Mahamud et al.
Fig. 13. Web server monitoring when patient pulse rate increased. 
Fig. 14. Web server monitoring when more than one sensor parameter crosses its nominal value. 
7 Conclusion 
The main objective of this paper is to create a low-cost IoT based medical surveillance 
system that can be a true virtual assistant to a doctor using smart technique. Real-time 
monitoring of the patient’s current health condition by family members is an added 
advantage of this system. The initial test run of the prototype is successful. But some 
future work is needed for this system. More upgraded sensors can be used to calculate 
the pulse rate. As it is an IoT based system the patient’s data must be safe and the data 
processing must be faster. In future, further research can be carried out to improve the 
algorithm of the system. 
CUSTODY: An IoT Based Patient Surveillance Device 233
References 
1. 10 shocking medical mistakes—CNN. https://www.cnn.com/2012/06/09/health/medical– 
mistakes/index.html. Accessed 2018 
2. Heart rate monitoring system using IR base sensor and Arduino Uno—IEEE Conference 
Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/7057005/. 
Accessed 25 Apr 2018 
3. A novel IoT based health monitoring system using LPC2129—IEEE Conference Publication. 
Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/8256660/. Accessed 25 Apr 
2018 
4. IOT based health monitoring systems—IEEE Conference Publication. Ieeexplore.ieee.org 
(2018). https://ieeexplore.ieee.org/document/8276181/. Accessed 25 Apr 2018 
5. HEMAN: Health monitoring and nous: An IoT based e-health care system for remote 
telemedicine—IEEE Conference Publication. Ieeexplore.ieee.org (2018). https:// 
ieeexplore.ieee.org/document/8300134/. Accessed 19 Jun 2018 
6. IoT based smart health care system using CNT electrodes (for continuous ECG monitoring) 
—IEEE Conference Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/ 
document/8230002/. Accessed 19 Jun 2018 
7. Remote prescription and I-Home healthcare based on IoT—IEEE conference publication. 
Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/8094069/. Accessed 19 Jun 
2018 
8. Fritzing. Fritzing.org (2018). http://fritzing.org/home/. Accessed 25 Apr 2018 
9. A. [closed]: Arduino Mega 2560 serial port location. Arduino.stackexchange.com (2018). 
https://arduino.stackexchange.com/questions/47727/arduino-mega-2560-serial-port-location. 
Accessed 25 Apr 2018 
10. Grove—Ear-clip Heart Rate Sensor| Techshopbd. Techshopbd.com (2018). https:// 
www.techshopbd.com/product-categories/biometrics/1389/grove-ear-clip-heart-rate-sensor-
techshop-bangladesh. Accessed 25 Apr 2018 
11. https://playground.arduino.cc/Learning/OneWire. Accessed 25 Apr 2018 
12. DS18B20 Digital Temperature Sensor (CN) | Techshopbd. Techshopbd.com (2018). https:// 
www.TechSoup.com/product-categories/temperature/2796/ds18b20-digital-temperature-sensor-
cn-techshop-bangladesh. Accessed 25 Apr 2018 
13. Sensing heart beat and body temperature digitally using Arduino—IEEE Conference 
Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/7955737/. 
Accessed 25 Apr 2018 
14. D. Load Cell—200 kg, S. Load Cell—10 kg and D. Load Cell—50 kg: Getting started with 
load cells—learn.sparkfun.com. Learn.sparkfun.com (2018). https://learn.sparkfun.com/ 
tutorials/getting-started-with-load-cells. Accessed 25 Apr 2018 
15. Overview | Espressif Systems. Espressif.com (2018). https://www.espressif.com/en/ 
products/hardware/esp32-devkitc/overview. Accessed 25 Apr 2018 
16. What is OSI model (Open Systems Interconnection)?—De?nition from WhatIs.com. 
SearchNetworking (2018). https://searchnetworking.techtarget.com/de?nition/OSI. 
Accessed 25 Apr 2018 
17. Sim900a Gsm Module Interfacing with Arduino Uno. Electronicwings.com (2018). http:// 
www.electronicwings.com/arduino/sim900a-gsm-module-interfacing-with-arduino-uno. 
Accessed 25 Apr 2018 
234 Md. S. Mahamud et al.
Personal Branding and Digital Citizenry: 
Harnessing the Power of Data and IOT 
Fawzi BenMessaoud(&) , Thomas Sewell III, and Sarah Ryan 
School of Informatics and Computing, Indiana University and Purdue University, 
Indianapolis, IN 46202, USA 
fawzbenm@iu.edu 
Abstract. With all that the internet has to offer, it is easy to get lost in the 
myriad of resources available to us both academically and socially. We have so 
many ways to learn, connect, and promote ourselves that in trying to stay current 
in today’s digital world, we can quickly ?nd ourselves overwhelmed. To be 
successful, we need a way to conveniently organize educational materials and 
references while also ensuring that only our very best self is on display. 
According to a study we conducted on this subject, the idea of personal online 
management is something which many value highly, but are unsure how to fully 
realize. We feel like this is problematic for any modern user, but this can be 
resolved. Using multiple data collection methods in our research, we explored 
the concept of “Digital Citizenship”. Digital Citizenship is de?ned as a way of 
expressing the online presence and personal brand that users have curated in a 
digital space; as well as a simpler, more ef?cient way to store and organize a 
personal digital library. We are presenting an app that would help to ?ll this 
need within the realm of academia and beyond. This app is a way of simplifying 
our lives, making the internet more accessible and managing personal, educa-tional, 
and academic materials, online pro?les, and social media accounts. 
Keywords: Personal brand
.n
Digital footprint
.n
Digital Citizenry 
Social media 
1 Major Aspects 
Our study was based on three factors that we felt were interdependent. We were 
interested in seeing how people consider their public data, represent their images 
online, and how they store personal data. These topics were labeled as Personal Brand, 
Web Presence, and Digital Content Storage. 
2 Personal Brand 
Personal Brand encompasses the way a person presents themselves online. Bridgen 
asserted that a person can become successful by developing and marketing their per-sonal 
brand, highlighting themselves in a positive light and developing their online self 
in such away to be engaging to others [2]. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 235–240, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_19
Over a period of time, successful individuals obtain a reputation and position based 
on a combination of their expertise and “connectedness”, which makes them attractive 
to other players operating in the same space. An authentic personal brand therefore 
delivers both a track record and a promise of the ongoing delivery of value. From the 
Journal of Business Strategy, we see the statement, “… in most cases authentic per-sonal 
brand builders are genuinely strong performers who are highly sought after by 
employers because they have the ability to use their personal social capital for the 
bene?t of the organization and their own career progression within it” [3]. 
2.1 Web Presence 
Inextricably linked to one’s personal brand is their web presence, particularly in the 
context of social media. Web presence is the public way an individual is observed from 
the point of view of an audience while on the internet. Jones postulated that the more 
connections people make, the larger their digital footprint, and the more likely potential 
employers will ?nd less positive aspects of a person’s digital life [4]. This is important 
to consider, particularly when searching for a career. However, no one lives in a 
vacuum, and digital connections with friends, family members, or even professional 
contacts is virtually inevitable in the world we live in today. 
According to Brake, “Pro?les and entries on Facebook, Twitter and many other 
such services can contain diaristic or confessional material that looks as if it is only for 
the author to read or perhaps for trusted friends and family - but although social media 
services often include tools to keep such writings private, many are visible to a large 
number of people or even published openly on the web with potential audience of 
millions.” [1]. 
The solution then is not simply to be aware, but to be able to manage one’s image 
in the digital space, promoting positive aspects while diminishing the aspects that are 
less so. In the article by Harris & Rae, the authors state, “… the ‘digital divide’ between 
the ‘haves’ and the ‘have nots’ in the developed world is now less about access to the 
web than it is about understanding how to actively participate in the networked society” 
[3] having the power to manage one’s overall web presence is key to success in modern 
times. 
2.2 Digital Content Library 
The concept of Digital Content Storage is a library, a collection place of all digital 
content that a person owns and uses. This is similar to other methods used to save and 
share ?les of different types and sizes, such as Dropbox or Google Docs. The app we 
are presenting as a solution works in this same way, but with the added bonus of the 
“vault” feature, which would be a speci?c space located within the library with extra 
security features for more sensitive and restricted document ?les and information. 
236 F. BenMessaoud et al.
3 Motivation 
Our motivation for this research was based on our hypothesis that the general populace 
has a lack of awareness regarding the importance of monitoring online presence and 
has dif?culty managing the vast resources available to them. We tested this theory. 
3.1 Methods 
In choosing a method of study we thought it would appropriate to make use of an 
online survey in order to reach a variety of respondents in light of our triple constraints: 
we were able to reach the highest amount of people in our given time by the most cost-effective 
means. We conducted our study using Google forms. We posted several links 
to our survey on Facebook and Twitter, in order to gain a wide viewing and have the 
most success. Distributing the survey in this way allowed us to get feedback from those 
who may no longer be students or in the academic world, and did not assume any prior 
knowledge of our topics, giving us the widest possible net to cast for data. This survey 
included questions based on personal brand, web presence, and digital content storage, 
gauging the participants both in their current knowledge of these topics and also their 
current usage of applications and software/hardware speci?c to these subjects. 
This initial survey was left open for one week. We used the analytics provided by 
Google Docs initially, and then used the raw data to analyze the information for 
ourselves to make our conclusions. We split this survey into sections, and each section 
was speci?c to one of the three topics we were testing. This allowed us to get a 
somewhat general idea of the prior knowledge our participants had for each of our 
topics. For example, “Are you Familiar with Personal Brand?” was a speci?c question 
we asked our participants in order to try to gain an understanding of what the general 
public might or might not already know about the subject, an approach we felt was 
useful in giving meaning to the survey. 
3.2 Findings 
The data we collected from our surveys proved to hold a number of patterns which we 
found in the process of our analyzation. Our initial survey collected data from 60 
volunteer participants. This gave us quite a bit of information, which was useful in 
gaining knowledge from a large variety of data. 
One of our main interests was in determining how important people considered 
their social media presence. We were interested in the importance people put on 
themselves and their personal media ?rst. Our results showed that over 50% of people 
placed themselves and their social media in the mid-range. 81.67% of respondents rated 
their social media at a 3 or higher on a scale of 1–5, with 66.67% rating themselves at 3 
or 4, the middle rankings (see Fig. 1). 
Another interesting pattern we found in our data was the distribution of gender, in 
the way that affected our survey. In Fig. 2, our respondents were 66% female, 33% 
male, so we thought it prudent to measure some of our responses by gender to ?nd 
important differences in the use of social media and web accounts. 
Personal Branding and Digital Citizenry 237
According to these results, gender is not a highly determining factor when con-sidering 
number of social media accounts currently in use by users. This is especially 
interesting considering the gender of respondents: as stated, 66% of respondents were 
female and 33% male, and interestingly, our data shows a low level of disparity 
between the two. We determined that this further proves that a better knowledge of 
one’s digital footprint is universally bene?cial (see Fig. 2). 
We were also very interested to see how highly people determine the importance of 
security of their saved content, asking them to rank that importance on a scale of 1–5. 
Interestingly, from this data, we found that zero respondents rated their security 
Fig. 1. Results of a survey question regarding participant’s ranking of the importance of their 
own social media presence. 
Fig. 2. Side by Side graphic showing media accounts held by gender. We did not ?nd gender to 
have any signi?cant impact on number of social media accounts held by participants. 
238 F. BenMessaoud et al.
importance at one, the lowest. Alternatively, 56.67% of respondents rated their interest 
in security of saved data at a 5, the highest possibility on the scale, which shows very 
clearly how highly security is considered (see Fig. 3). 
4 Conclusion 
In summary, we found that our initial hypothesis was correct. Our belief is that the 
internet is ever-expanding, producing more connections than have existed in any time 
prior. The many nuances of our presence in this digital space is often missed or not 
fully understood, and this can result in unexpected repercussions. The goal of our 
research was to see to what extent the people we surveyed were aware of their larger 
online presence and the way that they navigated the digital landscape. In examining the 
responses, we received for our survey, the patterns showed us that while the people we 
surveyed answered that they understood each part of the three categories we were 
questioning about, they lacked a big-picture perspective of how those categories were 
intertwined. Digital Citizenry combines these concepts together, providing users with a 
way to manage their online selves by understanding the overlap that comes from a 
digital space, and therefore empowering people to make the best decisions, both for 
their present and for their future. 
References 
1. Brake, D.R.: Sharing Our Lives Online: Risks and Exposure in Social Media. Palgrave 
Macmillan, Hampshire (2014) 
Fig. 3. Figure graphing the result of our question regarding importance of saved content. Over 
50% of all respondents indicated that it was of extreme importance to them by ranking security at 
the highest possible level. 
Personal Branding and Digital Citizenry 239
2. Bridgen, L.: Emotional labour and the pursuit of the personal brand: Public relations 
practitioners’ use of social media. J. Med. Pract. 12(1), 61–76 (2011). https://doi.org/10.1386/ 
jmpr.12.1.61_1 
3. Harris, L., Rae, A.: Building a personal brand through social networking. J. Bus. Strategy 32 
(5), 14–21 (2011). https://doi-org.proxy.ulib.uits.iu.edu/10.1108/02756661111165435. 
Accessed 9 Apr 2018 
4. Jones, C., et al.: Net generation or digital natives: is there a distinct new generation entering 
university? Comput. Educ. 54(3), 722–732 (2010). https://doi.org/10.1016/j.compedu.2009. 
09.022 
240 F. BenMessaoud et al.
Testing of Smart TV Applications: Key 
Ingredients, Challenges and Proposed 
Solutions 
Bestoun S. Ahmed(B) 
and Miroslav Bures 
Department of Computer Science, Faculty of Electrical Engineering, 
Czech Technical University, Karlovo n´am. 13, 121 35 Praha 2, Czech Republic 
{albeybes,buresm3}@fel.cvut.cz 
Abstract. Smart TV applications are software applications that have 
been designed to run on smart TVs which are televisions with integrated 
Internet features. Nowadays, the smart TVs are going to dominate the 
television market, and the number of connected TVs is growing expo-nentially. 
This growth is accompanied by the increase of consumers and 
the use of smart TV applications that drive these devices. Due to the 
increasing demand for smart TV applications especially with the rise 
of the Internet of Things (IoT) services, it is essential to building an 
application with a certain level of quality. Despite the analogy between 
the smart TV and mobile apps, testing smart TV applications is di?er-ent 
in many aspects due to the di?erent nature of user interaction and 
development environment. To develop the ?eld and formulate the con-cepts 
of smart TV application testing, this paper aims to provide the 
essential ingredients, solutions, answers to the most critical questions, 
and open problems. In addition, we o?er initial results and proof of con-cepts 
for a creeper algorithm to detect essential views of the applications. 
This paper serves as an e?ort to report the key ingredients and chal-lenges 
of the smart TV application testing systematically to the research 
community. 
Keywords: Smart tv application testing 
·
Software testing 
Model-based testing 
·
Internet of Things (IoT) 
1 Introduction 
A connected TV, which is popularly called smart TV, is a technological assem-blage 
device among computer and traditional television. The device is a combina-tion 
of conventional TV terminal, operating system (OS), and digital contents in 
which all of them are connected to the Internet. Smart TVs are providing di?er-ent 
digital services like multimedia, gaming, Internet browsing, on-demand enter-tainment 
access, a various online interactive session in addition to broadcasting 
media. In fact, these devices were expected to be more intelligent, interactive, 
and useful in the future [1]. Recently, the electronic companies along with IT 
?rms were rising investments in the technological advancements of these devices 
.f
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 241–256, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_20
242 B. S. Ahmed and M. Bures 
by launching new terminals and applications for smart TVs. It is expected shortly 
that these devices will be a frequent part of our smart homes within an Internet 
of Things (IoT) context1 . This explains why the smart TV market worth $265 
Billion by 20162 . 
Just like the new technological smart devices, smart TVs are operated by an 
OS with di?erent applications (apps) installed on it. Although the OS is the key 
software for operation, the installed apps on the smart TV brings di?erent uses 
and functionalities to the device. At a glance, the smart TV app may look like 
a mobile app due to the similarities of the OSs or the development kits. Due 
to this “fake” similarity, one may think of testing smart TV apps just like the 
mobile app testing. However, in fact, testing smart TV apps is di?erent due to 
the nature of user interaction with the app itself. 
In mobile apps, the user is interacting with the device touchscreen (i.e., the 
application) directly by hand whereas, within smart TVs, the user is interacting 
with the app through another device which is the remote controller. Of course, 
some vendors are providing interaction by touchscreen to the users, but the way 
that application behaves is still based on the remote control device when it comes 
to testing practices. In addition, the user of any TV (including the smart TVs) 
is usually staying away from the screen and almost use the remote device to 
operate the apps all the time. 
In the literature, mobile apps testing is well-studied, and many research direc-tions 
have been established, (e.g., [2–4]). However, testing smart TV apps is a 
new area and many challenges still without a solution, and many research ques-tions 
may arise without answers. To address these challenges and questions, it 
is essential to explore the app structures, interaction ways, development envi-ronments, 
and the technology behind the apps. In doing so, this paper examines 
the key ingredients of smart TV app testing. The paper aims to address the 
most demandable questions. The paper also discusses the challenges addressed 
so far in the literature and open problems for test automation and generation. 
Based on that, a systematic framework for testing applications on Smart TVs is 
illustrated throughout a prototype. The framework includes the testing process, 
its steps, and also the test generation strategy. This will help to validate the 
di?erent aspects of the applications before release. This could also serve as an 
initiative topic for further research in the near future. The framework will help 
to address and formulate more open problems and research questions. 
The rest of this paper is organized as follows. Section 2 summarizes the related 
works in the literature and those e?orts in smart TV app testing that could 
be useful here. Section 3 explains the technology behind the smart TV apps. 
Section 4 illustrates some analogy and di?erences between mobile and smart 
TV apps. Section 5 describes the navigation and control mechanism of smart 
TV apps. Section 6 discusses the open research problems in the smart TV app 
testing. Section 7 de?nes a prototype for a systematic automated testing strategy. 
Section 8 discusses the functional and non-functional testing Opportunities in 
1 
https://read.bi/2L4CDSI. 
2 
https://bit.ly/2HxnMkL.
Testing of Smart TV Applications 243 
Smart TV Applications. Finally, Sect. 9 give concluding remarks and also future 
research recommendations. 
2 Motivation and Literature 
Testing software applications on smart devices is considered to be a development 
and an evolution of testing practice from the traditional user interfaces (UI) 
like graphical user interface (GUI) and web application testing. The testing 
practices for these UIs have been studied extensively in the last decade, and as a 
result, many sophisticated methods, algorithms, and tools have been developed. 
Banerjee et al. [5] studied more than 230 articles published between 1991–2013 
in the area of GUI testing and Li et al. [6] surveyed the literature in two decades 
of web application testing. 
Mobile application testing could be considered as the ?rst e?ort towards 
smart application testing. There are many di?erences between mobile apps and 
graphical/web UI. In fact, the main issue that makes the di?erence in the testing 
process is the user interaction with the application. In the standard GUI and web 
applications, the keyboard and mouse combination is still the standard user input 
to interact with the applications. However, this is not the case for mobile apps as 
the user interacts with the device touchscreen by ?ngers and hence, there would 
be di?erent interaction behavior from various users. Although this issue leads 
to develop new testing strategies for mobile apps, still many of these strategies 
are taking bene?ts, wholly or partially, from the earlier methods and practices 
published for GUI and web application testing. For example, Amal?tano et al. 
[7] developed MobiGUITAR strategy for systematic mobile application testing 
from the GUITAR strategy [8] for GUI testing. An extensive study on mobile 
application testing is presented in [2]. 
Smart TV application is a new smart device application type. The views of 
the application are not like other applications. The application structure looks 
like web application as it relies on HTML, CSS, and JavaScript; however, the 
user interaction with the application di?ers from other types of applications. 
Usually, the user is not interacting with the application directly by hand, and it 
should be through another input device, which is the remote device. This could 
lead to think that the testing process is similar to the GUI or web application. 
However, the remote device does not behave like the standard mouse. While the 
standard mouse input device can move in every direction on the application, 
the remote device movement is restricted to four explicit directions. The inter-action 
di?erence makes many obstacles and di?culties when it comes to testing 
process. While the general concepts of model-based testing are applicable here, 
the construction of the model and the model type makes the di?erence. For 
example, due to the di?erent interaction nature, Nguyen et al. [8] used Event 
Flow Graph (EFG) as a model of the GUI testing, whereas Amal?tano et al. 
[7] uses state machine as a model for the mobile application testing. In smart 
TV app, both EFG and state machine models are not applicable. In Smart TV 
app, each transition from a state to another is practically just one step, while
244 B. S. Ahmed and M. Bures 
this is not the case in other applications. For example, in the mobile app, the 
distance between two icons (states) does not make sense in the transition, while 
this is very important in the smart TV application, and that will lead to a di?er-ent 
model. An important e?ort to formulate this model is done recently by Cui 
et al. [9]. Here, the Hierarchical State Transition Matrix (HSTM) is proposed as 
a model for the Android smart TV applications. While the model is promising, 
there is a need to develop and formulate it for the complex structure of di?erent 
applications. 
In fact, testing smart TV apps could be seen from di?erent angles. For exam-ple, 
usability testing is one of the critical testing issues to address the interaction 
between the user and the smart TV through remote device. This will help to 
improve the quality of the user interfaces of the applications. Ingrosso et al. [10] 
addressed this issue by using several users to test an e-commerce application 
on smart TV. Security testing is also an essential issue in the smart TV apps. 
However, we could not ?nd a published study addressing security in Smart TV 
apps. Recently, Sabina C. [11] discussed and described some of the testing plat-forms 
for Smart TV apps. The study chooses Opera and Samsung TV Stores for 
testing the applications. The testing process relies on the upload of the applica-tions 
to the Opera and Samsung application stores to verify them based on the 
code writing. Hence, there is no de?nition of the testing strategy itself, and that 
could not be considered as a formal testing process. The study has also addressed 
the importance of functional testing of these applications without giving details 
since it is a bachelor study with limitations. 
Although it is essential from the industrial point of view, we could not ?nd 
many companies giving solutions for smart TV apps testing. One of the exciting 
projects so far is the suite.st framework3 . The framework depends on record and 
replay testing style by using two di?erent devices, one for recording the actions, 
and the other is for acting like an emulator. In fact, the platform dealing with 
the application just like a web application and uses record and replay style of 
testing being employed by SeleniumHQ4 . The framework is a good startup for 
the industry to adapt selenium style of testing for smart TV apps. Although the 
framework claims that it is dealing with the functional testing of mobile apps, 
still the pass/fail criteria are not clear from an academic point of view. As a 
result, there is a need to de?ne a test oracle for the framework. In addition, the 
framework does not rely on some automatic test generator for fully testing of 
the applications. In fact, de?ning a test oracle for smart TV application could 
be a new research direction as we will address it later in this paper. 
3 Smart TV Apps Development and Technology 
Just like Android apps, smart TV apps are developed using Software Develop-ment 
Kits (SDK). The new versions of Android SDK supporting the development 
of smart TV apps. However, these applications can be run on Android Smart TV 
3 
https://suite.st. 
4 
http://www.seleniumhq.org/.
Testing of Smart TV Applications 245 
devices only. In fact, few SDKs were available for cross-platform development. 
For example, Josh?re5 Smart TV SDK was a platform to develop applications to 
work on Google and Samsung TV devices but not on LG TV devices. Mautilus6 
Smart TV SDK is also a platform for development, but still, the application 
is working on some versions of devices only. Smart TV Alliance7 was the most 
advanced SDK by supporting di?erent features and platforms. However, the 
project is shut down, and the SDK is not available for download. 
Samsung Tizen SDK provides a set of tools and frameworks to develop smart 
TV apps through Tizen Studio. The SDK is depending on the latest web tech-nologies 
such as JavaScript, CSS, HTML5, and W3C widget packaging. In fact, 
Samsung has established Tizen.Net which is a new cross-platform application 
development that has been integrated with Visual Studio. 
Nowadays, most of the SDK tools are relying on a uni?ed approach to the 
development technology for smart TV apps. The technologies behind the appli-cations 
are JavaScript, HTML5, and CSS3. JavaScript is used as a standard 
programming language to program the behavior of the applications. The use 
of JavaScript adds the page jumping capability of the application. It enables 
the developer also to code complex expressions and calculations like condi-tional 
branches, and loops. The ?fth version of the Hypertext Markup Language 
(HTML5) is used as the latest version for developing the web elements’ structure 
and content. The HTML5 is essential to develop the structure of the application 
page even without the JavaScript code, but that will lack the interactivity with 
the user [12]. Finally, the third version of the Cascading Style Sheets (CSS3) is 
used for the presentation of these web elements and polishing them for better 
visualization. These essential components are forming the latest and best tech-nology 
of the smart TV application, and also they are the newest technology for 
the World Wide Web. 
In general, Smart TV app could be one of two types, installed or cloud-based. 
Installed TV app is a stand-alone app installed on the smart TV without 
the need for the Internet connection, while the cloud-based TV app works as 
an interface between the cloud and the TV with a shallow content (almost no 
additional functionality) when there is no Internet connection. 
4 The Analogy and Di?erences of Smart TV and Mobile 
Apps 
There are many similarities and di?erences between the Mobile and Smart TV 
apps. These similarities and di?erences could be seen in three dimensions, (1) 
Functionality, (2) Design, and (3) User interaction. 
Both applications are working on smart devices. Hence, the functionality 
could be similar, as they are both connected to the Internet. The mobile apps 
5 
https://www.josh?re.com/. 
6 
https://www.mautilus.com. 
7 
http://www.smarttv-alliance.org.
246 B. S. Ahmed and M. Bures 
could be useful even without connection to the Internet; however, several smart 
TV apps are useless without the network connection. The computation power 
of the smart device also could de?ne the functionalities of the application itself. 
In fact, the mobile apps could be more functional than smart TV apps because 
the mobile devices nowadays may have more computational power than smart 
TVs. In addition, the aim of the mobile apps is almost di?erent from the smart 
TV apps. 
Speaking about the application design, there are many di?erences. For exam-ple, 
the size of the screen and icons could de?ne the layout of the application. 
Smart TV screens are wider than the mobile devices. The background color of 
the smart TV apps could be di?erent from the color in the mobile devices. From 
the user interaction point of view, smart TV apps are having less text entry as 
it is di?cult to enter text from the remote device. Most of the smart TV apps 
are designed to get the content from the Internet when connecting whereas this 
is not the case for the mobile apps, as they could be standalone applications 
without Internet connections interfaces8 . The typical smart TV application is 
much more straightforward than the mobile app, especially in the design layout. 
The way that the user interacts with the application de?nes an essential 
di?erence between the smart TV and mobile apps. The user of the mobile app 
interacts directly with the application without an intermediate device, while in 
the smart TV application, the user interacts with the help of a remote device. In 
fact, the UI of the smart TV apps sometimes called 10-foot user interfaces since 
the 10 ft (3 m) distance from the TV is the standard distance between the user 
and the TV. The developers are considering this distance when developing the 
user interface [11]. Using the remote device with this distance is not user-friendly 
and not responsive. Hence, the UI must consider this signi?cant di?culty. As 
mentioned previously in Sect. 2, this interaction di?erence will be signi?cant also 
when approaching the testing process with model-based testing. 
5 Navigation and Control in Smart TV Apps 
As mentioned previously, navigation on a smart TV application is through the 
remote device. Although some new TV devices are o?ering the direct interaction 
by the user with the screen, the most common interaction with the TV is still 
the remote device. The remote device consists of four essential navigation Right, 
Left, Up and Down. In addition, the remote device has an OK button to choose 
any selected view on the application after exploration. These ?ve key buttons 
should work properly while using an application. Figure 1 shows an example of 
the TV remote device. 
In addition to those ?ve buttons, there are many other buttons on the remote 
device that vary from a TV brand to another depending on the level of function-alities. 
Some of them are related to the hardware functionalities of the TV itself, 
as the power button to turn ON/OFF the TV. There are also ten buttons (from 
0–9) for channel jumps and even entering numbers in text ?elds if necessary. 
8 
https://bit.ly/2IiNb30.
Testing of Smart TV Applications 247 
Fig. 1. TV remote device. 
The UI layout of any application plays a primary rule in the testing process. 
Understanding the layout could lead to an e?cient test generator and runner. 
Smart TV apps are following some limited number of layout patterns. Figure 2 
shows three main patterns in which most of the smart TV apps are following. 
In fact, layout (b) is mostly used, since it puts many views in one window. 
Fig. 2. Three main layout design patterns for smart TV apps [13]. 
The remote device is putting constraints on the navigation from a view to 
another because it supports just one step navigation. Hence, each move on the 
layout is a step. This would not be a problem when two views are adjacent; 
however, for those non-adjacent views, more than one step is needed to move 
from one view to another. This navigation is very important when coming to 
the test generation strategy based on the application’s model. 
6 Open Problems and Challenges 
In this section, we discuss di?erent problems and challenges that need to be 
addressed for the smart TV app testing. In the following subsections, we will 
address each problem, the challenges to solve the problem and our suggestions.
248 B. S. Ahmed and M. Bures 
6.1 Start Point of Navigation 
One of the ?rst problems that the tester face when testing a smart TV app is 
the position of the navigational cursor. Technically speaking, from a JavaScript 
developer point of view, this happened when the focus point is not set in the 
application. For several applications on the store, this focus point is not set by 
the developers. As a result, when the application runs on the emulator, there is 
no pre-selected view on the application. The user must use the remote device to 
chose a view. Hence, the starting point of the navigator is missing. This problem 
is happening clearly with cloud-based TV apps because the views are changing 
in real-time with the cloud content. In fact, this is a challenging issue because it 
prevents the pre-generation of test sets. 
One solution to this problem is to let the tester choose the starting point of 
the testing. Yet, there could be a problem of good or bad selection point. Some 
starting points may lead to explore the app window sooner by navigating faster 
on the views. 
6.2 Repository and Benchmark 
In general, any software testing veri?cation and validation process should be eval-uated 
through some benchmarks. These benchmarks could be real instrumented 
programs with some properties for testing. For example, many testing strategies 
are using the benchmarks available at Software-artifact Infrastructure Reposi-tory 
website9 for benchmarking and evaluation. For android testing, there are 
di?erent applications for testing. For instance many papers were using TippyTip-per10 
, PasswordMaker Pro11 , MunchLife, K-9 Mail12 , Tomdroid13 , AardDict14 , 
and a few other applications for testing. 
In smart TV apps testing, we don’t have enough applications for benchmark-ing, 
and we don’t have a repository to store some benchmarks. In fact, there are 
two reasons behind this. First, smart TV apps are new and more time may be 
needed for the developers to create and publish open source applications. Sec-ond, 
the testing process of smart TV app is not de?ned yet, and the research 
is not initialized, in which this paper could be an e?ort toward that. Samsung 
maintains a page with some simple applications and examples15 . 
One solution for this di?culty is to develop applications for testing purposes. 
Here, the reliability of the testing process would be an issue. However, for better 
reliability, the testing and development groups could be separated. 
9 
http://sir.unl.edu/portal/index.php. 
10 
https://tinyurl.com/yd77qfzd. 
11 
https://tinyurl.com/ma65bc8. 
12 
https://tinyurl.com/6mzfdaa. 
13 
https://launchpad.net/tomdroid. 
14 
https://github.com/aarddict/android/issues/44. 
15 
https://bit.ly/2qC5ncS.
Testing of Smart TV Applications 249 
6.3 Test Generator 
In mobile app testing, most of the test generation strategies were almost inspired 
by other UI test generation strategies. For example, the test generator strategy 
of MobiGUITAR [7] framework was adapted from the GUITAR [8] framework 
for GUI testing. However, this method could not be followed in smart TV apps. 
Due to the user interaction di?erence in smart TV app, it is hard to adapt some 
test generator strategy from GUI or mobile app testing. For this reason, there 
is a need to develop a new test generation strategy. 
Although relying on previously investigating strategies is not clear at this 
early stage, following principles and concepts of model-based testing is still valid. 
Here, after deciding on the model and notations, the coverage criteria of the 
testing strategy would be another issue. De?ning the coverage criteria depends 
mainly on the tested functional and non-functional requirements. 
6.4 Activity Exploration 
The test generation stage cannot be performed without input to the generator 
algorithm. For functional or non-functional testing, most probably, the input 
would be two things, the number of events to test and the coverage criteria. As 
mentioned previously, the coverage criteria can be de?ned based on a prede?ned 
testing strategy. However, getting the input views for the test generation algo-rithm 
may need an exploration of the entire UI activity (i.e., window) of the 
smart TV app. 
Activity exploration is not a big issue (at least technically) when we have 
the source code of the application, i.e., white box testing. A simple code crawler 
could scan the HTML5 and CSS3 ?les and detect the views by parsing the 
code, and then feed the generator algorithm by these views. However, catching 
the views in the testing process without having the source code (i.e., black-box 
testing) could be a tricky job. In fact, there is a need for a special algorithm due 
to the special interaction with the application by the remote device. 
In Sect. 7.1, we will introduce an algorithm to creep the signi?cant views of 
the application activity in a black-box manner. 
6.5 Stopping Criteria 
Stopping criteria in the smart TV app could be an issue, especially for the cloud-based 
applications. In the installed TV app, there is a ?nite number of views in 
which the creeper can catch them, and the testing strategy can cover. When this 
coverage criteria are met, the testing strategy may stop. Hence, this can serve 
as stopping criteria. However, in cloud-based apps, there could be an in?nite 
number of events that appear in real-time feeding on the cloud. For example, 
the YouTube smart TV app is presenting new views (i.e., videos) when scrolling 
down in the application. Practically, there could be an in?nite massive number of 
views. The number of views may also vary with each new start of the application.
250 B. S. Ahmed and M. Bures 
One solution to this challenge is to de?ne a ?nite number of iteration in which 
the creeper can iterate over the application or limiting the number of views to 
be covered before the stop. 
6.6 Test Suite Ripper 
When generating the test cases, we expect some obsolete or invalid test cases. 
For example, some detected views during the creeping process may not be valid, 
and still, they may be presented in the test cases. To this end, there is a need 
for a test ripper to repair those test cases which are not valid. The test ripper 
may follow an algorithm to repair the test cases. For example, de?ning several 
prede?ned patterns of the invalid test cases or transitions from a view to another 
view. 
Another repairing process of the test cases could be unique from the remote 
device. For example, those color buttons on the remote device could be used for 
several functional and non-functional requirements depending on the application 
con?guration. 
6.7 Test Runner 
When the creeper detects the views, and the test cases are generated and repaired 
by the test generator and ripper, a test runner is needed to run these test cases. 
A test runner is merely taking the test suite and run the test cases one by one 
automatically. Here, the same test runner strategy in android app testing could 
be followed by the smart TV app testing. However, executing the test cases 
depends on the development kit. 
6.8 Fault Taxonomy and Categorization 
After running the test cases on the application, an important task is to iden-tify 
the encountered faults and the test cases in which these faults related to. 
However, faults in smart TV app are not known yet. Here, classical mutation 
testing is not applicable. For example, recently, Deng et al. [14] have identi-?ed 
di?erent faults in the Android apps within a mutation testing framework 
for mobile devices. In fact, those faults are more Android-oriented faults, and 
they are not applicable here. In addition, some of those faults are related to 
the Activity faults, for example, changing the screen orientation, which is also 
not appropriate because the Smart TV screen is too big to be frequently ori-ented. 
Normally, classical mutation test tools like MuDroid [15] or MuJava [16] 
are used for mobile, web or desktop apps. As we mentioned, those tools are 
platform-speci?c tools. An important e?ort in this approach is done by Cui et al. 
[9]. Cui et al. identi?ed eight di?erent types of faults in smart TV applications. 
These faults are, TV system halt, TV system reboot, displaying a black screen, 
having voices but no images, playing images with delaying, application exit by 
exceptions, playing images with a blurry screen, key has no response, or the
Testing of Smart TV Applications 251 
response key is wrong. While this is an excellent e?ort toward the fault catego-rization, 
there is a need to identify more faults related to the application itself. 
Some of those identi?ed faults may also relate to the TV device itself. Also, 
there is a need to identify a method for how to inject these faults in the smart 
TV. A signi?cant e?ort that can be done here is to conduct a study to de?ne 
the taxonomy of faults in Smart TV apps. A useful input to this study could 
come from smart TV industry especially those companies which are tracking 
and getting feedback from users in the cloud. Doing an analytical study on this 
data to categorize these faults would be an excellent ?nding. 
6.9 Defining Test Oracle 
De?ning the pass and fail criteria is a challenging task in software testing process. 
Within test automation, the mechanism for determining whether a given test 
case is passed or failed is named test oracle. In this context, the distinction 
between the correct and incorrect behavior is called “test oracle problem” [17]. 
A classical way to approach the test oracle is the manual identi?cation of the 
pass and fail by the developer. However, for a signi?cant amount of test cases, 
this is not accurate and impractical. 
Automating test oracles in smart TV app testing is not an easy task since we 
don’t know precisely the nature and the kind of faults the application face. In 
addition, the dynamic behavior of the cloud-based smart TV applications may 
lead to random new views that can be loaded. In fact, this task is connected to 
the fault taxonomy and categorization discussed in Sect. 6.8. When we know the 
faults and can categorize them, we can de?ne the test oracle for the automated 
testing framework. 
7 Towards an Automated Testing Strategy 
Based on the problems and challenges presented so far, here we can propose an 
automated framework to test the smart TV apps. This framework presents our 
vision for a strategy to automate the testing process. The framework is working 
in the Tizen SDK, which includes a smart TV emulator; however, the framework 
is a general framework and it is applicable for other possible emerging SDKs in 
the future. Figure 3 shows an overview of this framework and illustrates the 
essential components and their relationship to each other. 
The framework supports both white and black box testing styles. The tester 
chooses among these two features depending on the source code availability and 
the application type. As mentioned previously, even when the source code is 
available, when the application is a cloud-based app, the tester must consider 
this case as black-box testing. When the source code is available, the tester will 
import the project and let the framework do the rest automatically. Here, the 
creeper will scan the source code and tries to identify the essential views in 
the UI.
252 B. S. Ahmed and M. Bures 
Fig. 3. Smart TV App testing framework. 
In case of black-box testing or cloud-based app, which is probably the most 
critical case, the creeper must use a special algorithm to creep and detect all the 
views. Detail of this algorithm is presented in the following section (Sect. 7.1). 
Here, the creeper uses the log messages from the TV emulator to validate the 
views. 
In both white or black box testing approaches, the creeper will detect the 
essential views and convert all the views and their relationship with each other to 
a state machine graph model. This model will be the input to the test generator 
which consists of a model-based algorithm for generation and also a test Ripper 
to repair the test cases. The repair will be based on some prede?ned patterns of 
invalid test cases. This process is iterative until as far as there is an invalid test 
case. The framework will execute these test cases through a test runner on the 
TV emulator, and an automated test oracle module will validate them one by 
one. Finally, a test report will be presented to the user again. 
7.1 Application Creeper 
To detect all the necessary views in the application that need to present in the 
model for test generation, we have developed an algorithm called EvoCreeper. In 
fact, object detectors in UI for mobile, desktop, and web apps is not new. There 
are some algorithms called crawlers to crawl on the UI and detect these objects. 
None of those algorithms are useful here since we have an entirely di?erent user 
interaction behavior in the smart TV apps. Besides, we have thought that the 
name “creeper” suites perfectly with what we want to do as the “crawler” word 
gives a di?erent meaning due to its use in web and search engine technologies. 
Algorithm 1 shows the steps of the EvoCreeper. 
If the focus point is not set by the app developer, the EvoCreeper starts by 
an action from the tester to choose at least one view to start from, otherwise, it 
will start from the focused view. From this view, the creeper will start creeping 
the UI evolutionary and incrementally. The algorithm takes four directions DUp, 
DDown, DLeft, DRight plus the OK button from each view to move. When a new 
view discovered in each direction (i.e., newV iew = Active), the algorithm will add
Testing of Smart TV Applications 253 
Algorithm 1. EvoCreeper Steps 
1 Input: v1 is the user selected view 
2 Output: List of views to be modeled Lv 
3 Iteration It ?1 
4 Maximum Iteration Itmax 
?1 
max 
5 While ((It < Itmax) 
t 
(newV iew= null)) 
6 Use v1 as a start point 
7 From v1 generate ?ve possible directions DUp, DDown, DLeft, DRight, OK 
8 For each direction 
9 Navigate a step 
10 Monitor emulator log for reaction 
11 If newV iew = Active 
12 add newV iew to Lv 
13 End If 
14 It + + 
15 End For 
16 End While 
it to the list of views to be modeled Lv. This algorithm will continue until there are 
no new discovered views. Here, as another stopping criterion, the algorithm will take 
some preset number of iteration to avoid the endless discovery loop in some special 
cases of cloud-based apps. In the following section (Sect. 7.2), we present an example 
as a graphical proof of concept for this algorithm. 
7.2 Proof of Concept 
In this section, we present a proof of concept for the application creeper in 
Algorithm 1. Here, we consider a cloud-based app as a pilot example as it is the 
most di?cult scenario. As shown in Fig. 4, each activity window has 12 views 
and as the user shift down or right, new activities may appear. We consider 
three iterations of the algorithm. We assume that the tester will choose v1 as 
a start point. In fact, v1 is the worst case choice of the views and we observed 
that choosing the view in the middle of the window may lead to less iteration 
and better recognition of the views. From v1, the algorithm will consider four 
main directions, DUp, DDown, DLeft, DRight plus the OK button. However, here, 
we will consider only those four directions because the OK button may open a 
new window in the app. 
For each direction, the creeper algorithm will check for new events, which are 
most likely new views. Considering the ?rst iteration, and starting from v1, the 
up and left directions Du, Dl will not lead to new views, while the right direction 
Dr leads to v2 and the down direction Dd leads to v5. For the next iteration, the 
creeper will start from newly discovered views, v2 and v5 here. From v2, the news 
views v3 and v6 identi?ed by the creeper algorithm. In addition, v1is discovered 
in the Dl direction, however, it is neglected by the creeper as it is already available 
on the view list. Considering the v5, the views v1, v9, and v6 are in the three direction 
Du, Dd, and Dr respectively; however, only v9 considered as a new view.
254 B. S. Ahmed and M. Bures 
Fig. 4. Proof of concepts of the EvoCreeper. 
The third iteration also starts from the newly discovered views, v3, v6, and 
v9. In the same way, considering the four directions from each view and ?ltering all 
repeated views, four new views were identi?ed, v4, v7, v10, and v13. 
The EvoCreeper algorithm works in an iterative evolutionary style to discover 
new views and events in the application under test. As mentioned, this pilot 
example considers the cloud-based app. Hence, there is no expectation of the 
?nite numbers of views in the application. To this end, our proposed stopping 
criteria could be useful here. The creeper algorithm will continue for a certain 
number of iterations or when no new views discovered. 
8 Functional and Non-functional Testing Opportunities 
in Smart TV Applications 
For testing the functional or non-functional requirement in smart TV, we need a 
measure. This measure can be used in the test generation process as a coverage 
criterion and also can be used in the design of test oracle. While for functional 
requirement it is straightforward, converting a non-functional requirement into 
an exact measure is a tricky task. Here, an approximation could be useful. 
Many problems could be addressed here. For example, addressing the min-imum 
hardware requirements for a speci?c smart TV application would be an 
interesting idea to investigate. Most of the smart TV devices nowadays in the 
market rely on low computation power CPU and memory. Extra hardware may 
be used to measure the energy consumption of the CPU during the testing 
process. 
Covering the event interactions in di?erent level is also interesting functional 
testing. Here, full, partial, or systematic coverage of the events is the decision 
that must be made by the tester. Also, a comparison of these three coverage 
criteria is an important study topic to know which approach is better for fault 
?nding.
Testing of Smart TV Applications 255 
The limitation in memory and CPU lead to another interesting non-functional 
requirement that may also be used in the testing process, which is 
the execution time. It would be interesting to know the situation and sequences 
in the smart TV application that causes long or short execution time. This could 
also be useful to identify and detecting security vulnerabilities. In fact, security 
is an essential issue in smart TV applications that have never been addressed 
before. 
Probably, an essential non-functional requirement that must be addressed in 
smart TV applications is the usability. Due to the availability of remote device, 
the usability testing is necessary. In fact, the remote device remains the main con-straint 
facing the usability of the smart TV applications. At this early research 
stage, it is useful to address how to make the applications more usable and what 
are the factors that a?ect the usability. It is true that the user-oriented testing 
technique could be more realistic here; however, an automated testing method 
could support the ?nal result of usability testing report. 
9 Conclusion and Future Work 
In this paper, we have presented the key ingredients, challenges, and some pro-posed 
solutions for the smart TV app testing. We think that in the near future, 
smart TV apps will be an essential piece of software in the whole context of IoT 
services. Despite this importance, we can’t ?nd a systematic and robust testing 
strategy in the literature for the smart TV apps. After an extensive study of 
these applications, we discover many open problems and challenges in which we 
illustrated them in this paper. We found that the most crucial problem to be 
solved is the test generation strategy. In this paper, we proposed a fully auto-mated 
framework to test smart TV apps. In addition, we have also illustrated 
our EvoCreeper algorithm that creeps the views available in the application win-dow. 
The algorithm uses an iterative evolutionary style to discover new views. 
The output of the algorithm will be input to the test generator strategy that 
generates the necessary test cases for the automated testing framework. 
Depending on the testing process, there are many opportunities for smart 
TV app testing. For example, the security, usability, scalability, and robustness 
testing are essential issues that have not been addressed in the literature. Here, 
our proposed framework is also useful for these non-functional properties by just 
altering the test oracle and test generator components. As part of our work, 
we are planning to present more comprehensive strategy with testing results of 
di?erent smart TV apps in the future. 
Acknowledgment. This research is conducted as a part of the project TACR 
TH02010296 Quality Assurance System for Internet of Things Technology.
256 B. S. Ahmed and M. Bures 
References 
1. Jung, K.S.: The prospect of Smart TV service. Inf. Commun. Mag. 28(3), 3–7 
(2011) 
2. Zein, S., Salleh, N., Grundy, J.: A systematic mapping study of mobile application 
testing techniques. J. Syst. Softw. 117(C), 334–356 (2016) 
3. Sahinoglu, M., Incki, K., Aktas, M.S.: Mobile application veri?cation: a systematic 
mapping study, pp. 147–163. Springer, Heidelberg (2015) 
4. Amal?tano, D., Fasolino, A.R., Tramontana, P., Robbins, B.: Chapter 1 - testing 
android mobile applications: challenges, strategies, and approaches. In: Advances 
in Computers, vol. 89, pp. 1–52. Elsevier (2013) 
5. Banerjee, I., Nguyen, B., Garousi, V., Memon, A.: Graphical user interface (GUI) 
testing: systematic mapping and repository. Inf. Softw. Technol. 55(10), 1679–1694 
(2013) 
6. Li, Y.-F., Das, P.K., Dowe, D.L.: Two decades of web application testing-a survey 
of recent advances. Infor. Syst. 43(C), 20–54 (2014) 
7. Amal?tano, D., Fasolino, A.R., Tramontana, P., Ta, B.D., Memon, A.M.: Mobi-guitar: 
automated model-based testing of mobile apps. IEEE Softw. 32(5), 53–59 
(2015) 
8. Nguyen, B.N., Robbins, B., Banerjee, I., Memon, A.: Guitar: an innovative tool 
for automated testing of GUI-driven software. Autom. Softw. Eng. 21(1), 65–105 
(2014) 
9. Cui, K., Zhou, K., Song, H., Li, M.: Automated software testing based on hierar-chical 
state transition matrix for Smart TV. IEEE Access 5, 6492–6501 (2017) 
10. Ingrosso, A., Volpi, V., Opromolla, A., Sciarretta, E., Medaglia, C.M.: UX and 
usability on Smart TV: a case study on a T-commerce application, pp. 312–323. 
Springer, Cham (2015) 
11. Sabina, K.C.: De?ning a testing platform for Smart TV applications. Bachelor 
thesis, Helsinki Metropolia University of Applied Sciences, January 2016 
12. Bluttman, K., Cottrell, L.M.: UX and usability on Smart TV: a case study on a 
T-commerce application. McGraw Hill Professional, Cham (2012) 
13. Murgrabia, M.: Design considerations for Vewd app store applications (2017). 
Accessed 5 Dec 2017 
14. Deng, L., O?utt, J., Ammann, P., Mirzaei, N.: Mutation operators for testing 
android apps. Inf. Softw. Technol. 81(C), 154–168 (2017) 
15. Moran, K., Tufano, M., Bernal-Cardenas, C., Linares-Vasquez, M., Bavota, G., 
Vendome, C., Di Penta, M., Poshyvanyk, D.: Mdroid+: a mutation testing frame-work 
for android. In: 40th International Conference on Software Engineering 
(ICSE) (2018) 
16. Ma, Y.-S., O?utt, J., Kwon, Y.R.: MuJava: an automated class mutation system: 
research articles. Softw. Test. Verif. Reliab. 15(2), 97–133 (2005) 
17. Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem 
in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)
Dynamic Evolution of Simulated 
Autonomous Cars in the Open World 
Through Tactics 
Joe R. Sylnice and Germ´ an H. Alf´erez(B) 
School of Engineering and Technology, Universidad de Montemorelos, 
Apartado 16-5, Montemorelos, N.L. 67500, Mexico 
1140134@alumno.um.edu.mx, harveyalferez@um.edu.mx 
Abstract. There is an increasing level of interest in self-driving cars. In 
fact, it is predicted that fully autonomous cars will roam the streets by 
2020. For an autonomous car to drive by itself, it needs to learn. A safe 
and economic way to teach a self-driving car to drive by itself is through 
simulation. However, current car simulators are based on closed world 
assumptions, where all possible events are already known as design time. 
Nevertheless, during the training of a self-driving car, it is impossible 
to account for all the possible events in the open world, where several 
unknown events may arise (i.e., events that were not considered at design 
time). Instead of carrying out particular adaptations for known context 
events in the closed world, the system architecture should evolve to safely 
reach a new state in the open world. In this research work, our contribu-tion 
is to extend a car simulator trained by means of machine learning 
to evolve at runtime with tactics when the simulation faces unknown 
context events. 
Keywords: Autonomous 
car·
Tactics 
·
Dynamic evolution 
Open world 
·
Machine learning 
1 Introduction 
A human driver learns by practicing how to drive and how to detect problems 
in the car and on the road. It is basically the same in the case of autonomous 
cars. These cars learn from historical data to learn how to drive. 
However, a self-driving vehicle is really expensive to build and maintain. In 
fact, there are reports informing that NVIDIA is selling its self-driving process-ing 
unit for about $15,000 [1]. That is really expensive taking into account that 
this is the price of only the processing unit. Also, it is dangerous and careless 
to unleash a self-driving car without proper training and testing. Simulations to 
prove new approaches in autonomous cars could be used to solve the aforemen-tioned 
problems in the academic world, and especially in developing countries 
with limited ?nancial resources. 
In the closed world, all the possible context events are known beforehand (i.e., 
at design time or during training under a machine-learning approach). However, 
.r
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 257–268, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_21
258 J. R. Sylnice and G. H. Alf´erez 
in the open world, unknown context events can arise (e.g. a sudden malfunction 
in one of the car sensors). This kind of events have to be controlled e?ciently in 
order to prevent problems with the driver and passengers. Moreover, although 
there are open-source simulators, these simulators do not manage uncertainty in 
the open world. 
In this research work, our goal is to extend the applicability of machine 
learning by means of tactics to carry out the dynamic evolution of simulated 
autonomous cars in the open world. Tactics are last-resort surviving actions to 
be used when the simulated car does not have prede?ned adaptation actions to 
deal with arising problematic context events in the open world [2]. In order to 
apply tactics in the open world, the source code of a car video game was modi-?ed. 
First, the car was trained with the following supervised learning algorithms: 
K-Nearest Neighbors, Logistic Regression, Support Vector Machines, and Deci-sion 
Trees. Then, unknown context events were injected at runtime to evaluate 
how the car faces those events with tactics. 
This paper is organized as follows. Section 2 presents the theoretical founda-tion 
of this research work. Section 3 presents the results. Finally, Sect. 4 presents 
the conclusions and future work. 
2 Justification 
The research ?eld of self-driving cars is a hot topic nowadays. However, the 
technology behind a self-driving car relies heavily on state-of-the-art software and 
really expensive hardware. That is why simulation tools are being increasingly 
used in the ?eld because they provide the mechanisms to test and evaluate 
the system of a self-driving car without having to buy (or even damage) really 
expensive hardware [3]. 
Prede?ned adaptation actions for known context events in the closed world 
are not enough in the open world where several unknown context events can arise. 
Despite the recognized need for handling unexpected events in self-adapting 
systems (SAS) [4], the dynamic evolution of SAS in the open world is still and 
open and challenging research topic. 
In order to visualize the impact of unknown context events in the open world, 
let us imagine a self-driving car that has been trained with machine learning. 
The training was carried out with datasets composed of known historical data 
(e.g. data related to sonar and LiDAR sensors). In other words, the training 
was applied in the closed world. However, at runtime several unknown events 
may arise in the open world. For instance, although the sensors are highly cali-brated 
and thoroughly revised, it is possible that a sensor starts recording inac-curate 
data (e.g. because of a broken sonar sensor). This is a dangerous situation 
because inaccurate data could lead to an accident. If the car was not trained to 
face this kind of situations, then the following question arises: what will the car 
do? In order to answer this question, in addition to applying machine learning 
to train self-driving cars, it is necessary to count on mechanisms to lead the car 
to make the best decision despite unknown context events.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 259 
3 Underpinnings of Our Approach 
Our approach is based on the following concepts (Fig. 1). 
Fig. 1. Underpinnings of our approach. 
3.1 Machine Learning 
Machine learning can be de?ned as computational methods using experience to 
improve performance or to make predictions accurately. Experience can refer to 
past data that is used by the learner. The quality and size of the data are very 
important for the accuracy of the predictions made by the learner [5]. 
3.2 Tactics 
Tactics are last-resort surviving actions to be used when a system does not have 
prede?ned adaptation actions to deal with arising problematic context events 
in the open world [2]. The use of tactics is common in sports, war, or even 
in daily matters to accomplish an end. For example, the most important goal 
during a battle is to win. However, unknown or unforeseen events, such as sur-prise 
assaults, may arise. These events may negatively a?ect the expected goal. 
Therefore, it is necessary to choose among a set of tactics to reach the goal (e.g. 
to escape vs. to do a frontal attack). Tactics are prede?ned at design time and 
are used at runtime to trigger the dynamic evolution of the self-driving car. The 
tactics are required to be known beforehand in order for the self-driving car 
to face uncertainty. However, these tactics are not associated with any speci?c 
recon?guration actions (as dynamic adaptation does) [6]. 
3.3 Dynamic Evolution 
A self-driving car has to go from dynamic adaptation in the closed world to 
dynamic evolution in the open world in order to respond to unforeseen ongoing 
events. Dynamic adaptation can be referred to as punctual changes made to 
face particular events by activating and deactivating system features based on 
the current context. Meanwhile, dynamic evolution is not just about applying 
punctual adaptations to concrete events but it is the gradual growth of the 
system to a better state depending on the current context events [2].
260 J. R. Sylnice and G. H. Alf´erez 
3.4 Open World 
Open world can be referred to as a context where events are unpredictable, 
requiring that software reacts to these events by adapting and organizing its 
behavior by itself [7]. As far as we know, current simulated autonomous cars 
are based on the closed world assumption where the relationship between the 
car and the surroundings are known and unchanging. Nevertheless, in the open 
world where the aforementioned relationship is unknown, unpredictable, and 
constantly changing, the simulated car has to be able to evolve. 
4 Related Work 
A fully autonomous car or self-driving vehicle is a car that is designed to be able 
to do all the work of maneuvering the car without the passenger never having 
to or is not expected to take control of the car at any time or any given moment 
[8]. A self-driving vehicle has to be able to identify faults in its system. If the 
faults are critical, the vehicle has to either ?x these faults or isolate them so that 
the system is not compromised [9]. 
Self-driving vehicles are equipped with state-of-the art sensors and cameras. 
Also, they use powerful software behind the hardware to maneuver themselves. 
The software learns how to drive through machine learning and the software sees 
through computer vision. 
There are several self-driving cars in development. For example, the Google 
Car is being developed by Google. Google hopes to have self-driving cars on 
the road by 2020. However, this company does not intend to become a car 
manufacturer. Uber also entered the world of self-driving cars in April 2015. In 
addition, Tesla expects to launch a fully autonomous car anytime in 2018. Also, 
in April 2015, BMW has partnered with Baidu the “Chinese Google”, to develop 
self-driving technology. 
There are several research works that propose simulations of autonomous 
cars. For instance, in [10] the authors propose a shader-based sensor to simulate 
the LiDAR and Radar sensors instead of the common method of ray tracing. 
They mention that sensor simulations are very important in the ?eld of self-driving 
cars. In this way, the sensors can be evaluated, tested and optimized. 
The authors state that ray tracing is an intensive task for the CPU. It is not 
problematic when the number of simulated rays and detected objects are small. 
However, in reality it becomes problematic or even impossible. According to the 
authors, a shader-based sensor simulation is an e?cient alternative to ray casting 
because it uses parallelism in the GPU and this helps in sparing CPU resources 
that the software can use in other areas. 
In [11], the authors mention that they have used a simulation tool called 
Scene Suite to generate simulated scenes of tra?c scenarios. The tool allows 
2.5D simulations and uses patented virtual sensor models. The goal of this work 
is to show how the data from real world sensor models could be extracted and 
then to simulate the results using a scene based pattern recognition. Also, this 
paper introduced an approach for learning sensor models with a manageable
Dynamic Evolution of Simulated Autonomous Cars in the Open World 261 
demand on computational power based on a statistical analysis of measurement 
data clustered into scene primitives. 
In [12], the authors focus on the use of the agent-based simulation framework 
MATsim and how it could be applied to the ?eld of self-driving cars. Agent-based 
simulations are state-of-the-art transport models. Agent-based approaches com-bine 
activity-based demand generation and dynamic tra?c assignments. MAT-
Sim is a simulation of multi-agent transport based on activity. It is an open 
source framework written in JAVA under the GNU license. MATSim’s strength 
is the modular design around a core, allowing new users to customize it without 
much e?ort. This work is based on the simulation of autonomous vehicles in a 
realistic environment at a large scale with individual travelers (vehicles) that 
adapt their movement dynamically with the others. 
In [13], the author uses an open source simulator to carry out the evaluation 
and application of a reinforcement learning approach to the problem of control-ling 
the steering of a vehicle. Reinforcement Learning (RL) is an area of machine 
learning in which an agent is placed into a certain environment and is required to 
learn how to take proper actions without having any previous knowledge about 
the environment itself. If the agent’s behavior is right, it is rewarded. If the 
behavior is wrong, the agent is punished. This learning system of reinforcement 
learning is called trial and error. In order to evaluate this approach, the Open 
Racing Car Simulator (TORCS) was used. In the TORCS environment a car is 
referred to as a Robot. 
In [3], the authors use an integrated architecture that is comprised of both 
a tra?c simulator and a robotics simulator in order to contribute to the self-driving 
cars simulation. Speci?cally, the proposed approach uses the tra?c sim-ulator 
SUMO and the robotics simulator USARSim. These tools are open source 
and have good community support. In one hand, SUMO is a microscopic road 
tra?c simulator written in C++. It was designed by the Institute of Transporta-tion 
Systems at the German Aerospace Center to handle large road networks. 
On the other hand, USARSim is an open-source robotics simulator written in 
Unreal Script, which is the language of the Unreal game engine. It has high qual-ity 
sensor simulation and physics rendering. The authors modi?ed the SUMO 
and USARSim simulators in order to be able to implement the architecture for 
the self-driving car simulation. The result is a simulator in which a self-driving 
vehicle can be deployed in a realistic tra?c ?ow. 
In [14], the authors describe the global architecture of the simulation/proto-typing 
tool named Virtual Intelligent Vehicle Urban Simulator (VIVUS) devel-oped 
by the SeT Laboratory. The VIVUS simulator simulates vehicles and sen-sors. 
It also takes into account the physical properties of the simulated vehicle 
while prototyping the arti?cial intelligence algorithms such as platoon solutions 
and obstacle avoidance devices. The goal of VIVUS is therefore overcoming the 
general drawbacks of classical solutions by providing the possibility of designing 
a vehicle virtual prototype with simulated embedded sensors. 
In [15], the authors combine a tra?c simulator and a driving simulator 
into an integrated framework. They have used the driving simulator SCANeR
262 J. R. Sylnice and G. H. Alf´erez 
developed by Renault and Oktal, and the AIsum tra?c simulator developed by 
TSS-Transport Simulation Systems. The framework enables a driver to use the 
simulator with a local tra?c situation managed by a nano tra?c model that is 
realistic for the driver and that also provides a realistic global tra?c situation in 
terms of ?ow and density. The framework can provide information on the simu-lated 
vehicles and the tra?c situation for the short-ranged sensors: camera and 
radar and also the long-ranged sensors: wireless and embedded navigation. It also 
enables the driver and other systems to be involved in an extensive assortment 
of tra?c situations, accidents, rerouting, road-work zones, and so on. 
5 Results 
5.1 Methodology 
This project has been broken down in the following steps: 
Looking for an Open Source Car Simulator: To ?nd the open source car 
simulator, Google Search was used with the term “open source car simulator” 
in December 2017. The following is the list of the open source car simulators 
found: 
– TORCS1 : TORCS is a multi-platform car racing simulation. It is used as an 
ordinary car racing game, as an arti?cial intelligence (AI) racing game, and 
as a research platform. 
– Apollo2 : Apollo is an open-source autonomous driving platform created by 
Baidu. It has a high performance and ?exible architecture that supports fully 
autonomous driving capabilities and also has car simulation functionalities. 
– Udacity’s Self-Driving Car Simulator3 : This simulator was built for 
Udacity’s Self-Driving Car nanodegree to teach students how to train cars 
and how to navigate road courses using deep learning. 
Comparing Di?erent Open Source Car Simulators: The criteria for 
choosing the car simulator were the following: (1) it had to be open source 
to ?nd the points in which it could be extended; (2) it had to be mature enough 
in terms of documentation; (3) it had to be supported by the developer commu-nity; 
and (4) it had to be easily extensible in terms of programming. The results 
of the comparison are as follows: 
1. TORCS meets three of the four criteria. Although, it is open source, mature, 
well known in the scienti?c world, and is greatly supported by the developer 
community, it misses the fourth criteria because it is not easily extensible in 
terms of programming. 
1 
http://torcs.sourceforge.net/index.php?name=Sections&op=viewarticle&artid=1. 
2 
https://github.com/ApolloAuto/apollo. 
3 
https://github.com/udacity/self-driving-car-sim.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 263 
2. Apollo is a fully ?edged open autonomous driving platform that meets two of 
our criteria: it is open source and mature. However, it is a fully autonomous 
driving platform, much more complex than a simulator. Also, since it was 
released a couple of months prior to our search, it does not yet have a wide 
developer community support. Also, the documentation, written in Chinese 
is not yet translated. 
3. Udacity’s self-driving car simulator falls short when it comes to documen-tation. 
As a result, although it is an open source software, the lack of free 
documentation makes it di?cult to extend the code. 
According to the evaluation, none of these simulators ful?lled our needs. There-fore, 
instead of searching for open source autonomous car simulators, we looked 
for an open source car game, which could be trained by means of machine learn-ing 
and extended for usage in the open world. 
We found an open source car game named Lapmaster4 . It is a simple car 
game designed with the pygame Python library. It consists of a car running 
around a circuit for a certain amount of laps. Also, the player is able to shift 
the gears. The goal of the game is to complete the laps as fast as possible. Fig. 2 
shows a screenshot of this game. 
Fig. 2. Screenshot of the Lapmaster game. 
4 
http://pygame.org/project-Lap+Master-2923-4798.html.
264 J. R. Sylnice and G. H. Alf´erez 
Extending the Car Simulator: In this step, the Lapmaster car simulator 
was extended for the open world. Speci?cally, two steps were carried out: (1) 
collecting data from the context of the car for training; and (2) training the 
simulated car with machine learning. These steps are described as follows. 
1. Collecting data from the context of the car: The source code of the car game 
was modi?ed to collect the position (x and y coordinates) and the direction 
(0 - forward, 1 - right, and 2 - left) of the car in every frame. Listing 1 shows 
the modi?ed lines of the car’s source code. On line 1, a while loop indicates 
that the code is executed while the car simulator is running. On line 2, the 
program detects the key that is pressed. On line 3, if the car is moving, then 
the program checks if the key “d” (right) or key “a” (left) is pressed. These 
values are stored in the l data list. Speci?cally, three values are stored in this 
list: the x and y coordinates, and the direction (0 for forward, 1 for right, 
and 2 for left). If no key is pressed, then the program stores a 0 in the l data 
list. On line 12, if the l data list is not empty, then it is passed to the Writer 
function with the log’s path in which the contextual data is to be written. 
Listing 2 presents the Writer function which writes the data in the comma-separated 
values (CSV) format. The CSV ?le contains 4,149 instances. This 
number of instances was obtained by running the game four times. The x and 
y coordinates were taken as the features for training, and the direction as the 
class. 
1 while running : 
2 key = pygame . key . get_pressed () 
3 if red . gear > 0: 
4 if key [ K_d ]: 
5 red . view = ( red . view + 2) % 360 
6 d = 1 
7 elif key [ K_a ]: 
8 red . view = ( red . view + 358) % 360 
9 d = 2 
10 else : 
11 d = 0 
12 l_data = [ red .xc , red .yc , d] 
13 if l_data : 
14 data . Writer ( l_data , path ) 
Listing 1.1. A fragment of the modi?ed code of the Lapmater’s source ?le. 
1 import csv 
2 def Writer ( data , path ): 
3 with open ( path , "a") as c_file : 
4 write = csv . writer ( c_file , delimiter = ’,’) 
5 write . writerow ( data ) 
Listing 1.2. Implemented function for data writing. 
2. Training the simulated car: For the training of the simulated car, four super-vised 
machine learning algorithms from the scikit-learn5 Python library were 
employed. The algorithms are the following [16]: 
(a) K-Nearest Neighbor (KNN): It is a simple algorithm that stores 
all available cases and classi?es new cases by a majority vote of its k 
neighbors. 
5 
http://scikit-learn.org/stable/#.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 265 
(b) Logistic Regression (LR): It is a classi?cation algorithm used to esti-mate 
discrete values based on a given set of independent variables. It 
predicts the probability of occurrence of an event by ?tting data to a 
logit function. 
(c) Support Vector Machine (SVM): In this classi?cation algorithm, 
each data point is plotted in an n-dimensional (n being the number of 
features) space where the value of each feature is the value of a partic-ular 
coordinate. Then a line called separating hyper plane or (decision 
boundary) splits the data points between two or more groups of data. 
The further the data points from the decision boundary, the more con?- 
dent the algorithm is about the prediction. The closest data points to the 
separating hyper plane are known as support vectors. 
(d) Decision Trees (DT): In this classi?cation algorithm, the data is split 
into two or more homogeneous sets based on most signi?cant attributes 
that makes the sets distinct. 
The following are the steps used to train the simulated car: (1) a user ran the 
game to generate a dataset; (2) the KNN, LR, SVM, and DT algorithms were 
executed to get a classi?cation for each class. The classes were 0 for forward, 1 for 
right, and 2 for left; (3) the models were evaluated in terms of cross validation; 
and (4) the simulated car was extended to use the most accurate classi?er. 
A fragment of the script to generate the classi?cation models from the data 
collected is presented in Listing 3. The ?rst line declares a list containing the 
information of the four classi?ers used in the experiments. Next, a for loop is 
used to iterate over this list in order to train and generate a model for each 
algorithm. Line 9 speci?es the location and the name of the model that is going 
to be trained. In Lines 12 and 13, the program splits the data into training and 
test sets. The code in Line 11 indicates that the values are going to be taken 
randomly from the dataset. On lines 14–15, a classi?cation model is created and 
the cross-validation score is evaluated. In Line 17, the accuracy of each algorithm 
is computed. In Lines 18–21, each model is evaluated and a classi?cation report 
is generated. Finally, the model generated by each algorithm is saved. 
1 classifiers = [ 
2 ( ’kNN ’, KNeighborsClassifier ( n_neighbors =4) ) , 
3 ( ’LR ’, LogisticRegression () ) , 
4 ( ’SVM ’, SVC () ) , 
5 ( ’DT ’, DecisionTreeClassifier () ) 
6 ] 
7 
8 for name , clf in classifiers : 
9 filename = ’ models /% s_ %s. pickle ’ % ( name , data . filename ) 
10 print ( ’ training : %s ’ % name ) 
11 rs = np . random . RandomState (42) 
12 X_train , X_test , y_train , y_test = 
13 train_test_split (X , y , test_size =0.2 , random_state = rs ) 
14 model = clf . fit ( X_train , y_train ) 
15 cv = cross_val_score (clf , X_test , y_test , cv =10 , 
16 scoring = ’ accuracy ’) 
17 acc = np . mean ( cv ) 
18 predictions = clf . predict ( X_test ) 
19 report = classification_report ( y_test , predictions ) 
20 print ( ’ training %s done ... acc = %f ’ % ( name , acc )) 
21 pickle . dump ( model , open ( filename , ’wb ’)) 
22 bm . append ( ’%s %s ’ % ( name , report )) 
Listing 1.3. A fragment of code to train and generate classi?cation models.
266 J. R. Sylnice and G. H. Alf´erez 
Injecting Dynamic Evolution Through Tactics: In this step, we emulated 
that a sonar sensor was malfunctioning. This situation can cause accidents since 
the car will not be able to “see” properly its environment (e.g. other cars). To 
trigger this event, a button on the keyboard was pressed. When the car system 
recognizes that an unknown context event has arisen, then the “decelerate tactic” 
is triggered. This tactic progressively slows down the car until it reaches the state 
of a full stop. The reasoning behind this tactic is to prevent that the car keeps 
going on without properly detecting its surroundings. The implemented tactic 
is shown in Listing 4. Speci?cally, when the “s” key is pressed on the keyboard, 
the slow variable is set to true to indicate that the car has to reduce the speed 
until if fully stops. 
1 slow = False 
2 
3 key = pygame . key . get_pressed () 
4 if key [ K_s ]: 
5 slow = True 
6 if slow : 
7 red . speed = .95 * red . speed - .05 * (2.5 * red . gear ) 
Listing 1.4. A fragment of the source code for the decelerate tactic. 
5.2 Outcomes 
The accuracy of the models generated with the four algorithms are as follows: 
kNN = 0.9313, LR = 0.8927, SVM = 0.8927, DT = 0.929. Table 1 shows the 
cross validation results of each model generated with the four classi?ers. Also, in 
Table 1, only two classes are shown: 0 for forward and 1 for right. That is because 
the circuit in the Lapmaster game only has right turns. Although the kNN 
algorithm has the best accuracy, the DT algorithm has better results in terms 
of precision, recall, and f1-score. The three aforementioned terms are de?ned as 
follows [17]: 
– Precision is the ability of the classi?er not to identify as positive a sample 
that is negative. 
– Recall is the ability of the classi?er to ?nd all the positive samples. 
– F1-score is a weighted mean of the precision and recall. 
5.3 Discussion 
We published a video6 in which the “decelerate tactic” is e?ectively triggered at 
runtime. Although machine learning works ?ne in the closed world, i.e., where 
there are no unknown events (e.g. malfunctioning sensors), in the open world it 
is necessary to count with additional mechanisms to face uncertainty. Therefore, 
we argue that autonomous cars that are trained by means of machine learning 
need to be extended with highly general tactics that try to defend the car in 
extreme conditions of uncertainty. 
6 
www.harveyalferez.com/autonomous-car-demo.html.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 267 
Table 1. Report for each of the algorithm models. 
Precision Recall f1-score 
kNN 
0 0.95 0.99 0.97 
1 0.83 0.56 0.67 
Avg/Total 0.94 0.94 0.94 
LR 
0 0.89 1.00 0.94 
1 0.00 0.00 0.00 
Avg/Total 0.80 0.89 0.84 
SVM 
0 0.90 1.00 0.95 
1 1.00 0.03 0.07 
Avg/Total 0.91 0.90 0.85 
DT 
0 0.97 0.98 0.97 
1 0.82 0.71 0.76 
Avg/Total 0.95 0.95 0.95 
6 Conclusions and Future Work 
This research work extended the applicability of machine learning by means of 
tactics to carry out the dynamic evolution of a simulated self-driving car in the 
open world. To this end, four classi?ers were executed and four models were 
generated and evaluated. The DT model was used in the simulated car after 
evaluation. Then, a tactic to face a simulated unknown context event in the 
open world was implemented. This tactic was used to prevent a situation in 
which the life of the passengers could be put in jeopardy. 
Since this research work was limited to the implementation and application 
of one tactic, as future work we would like to propose additional tactics. For 
example, tactics related to non-functional requirements, such as availability and 
performance, could be used to keep or improve service levels. Also, these tactics 
could be handled during execution by means of models at runtime as proposed 
in our previous work [2]. Moreover, we plan to test our approach in other tracks 
in which complex unknown context events could arise.
268 J. R. Sylnice and G. H. Alf´erez 
References 
1. Frederic, L.: All new Teslas are equipped with NVIDIA’s new drive PX 2 AI 
platform for self-driving. https://goo.gl/xNSo8B 
2. Alf´erez, G.H., Pelechano, V.: Achieving autonomic web service compositions with 
models at runtime. Comput. Electr. Eng. 63, 332–352 (2017) 
3. Pereira, J.L., Rossetti, R.J.: An integrated architecture for autonomous vehicles 
simulation. In: Proceedings of the 27th Annual ACM Symposium on Applied Com-puting, 
pp. 286–292. ACM (2012) 
4. Cheng, B.H., De Lemos, R., Giese, H., Inverardi, P., Magee, J., Andersson, 
J., Becker, B., Bencomo, N., Brun, Y., Cukic, B., et al.: Software engineering for 
self-adaptive systems: a research roadmap. Software engineering for self-adaptive 
systems, pp. 1–26. Springer, Heidelberg (2009) 
5. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. 
MIT press (2012) 
6. Alf´erez, G.H., Pelechano, V.: Facing uncertainty in web service compositions. In: 
2013 IEEE 20th International Conference on Web Services (ICWS), pp. 219–226. 
IEEE (2013) 
7. Baresi, L., Di Nitto, E., Ghezzi, C.: Toward open-world software: issues and chal-lenges. 
Computer 39(10), 36–43 (2006) 
8. Coles, C.: Automated vehicles: a guide for planners and policymakers (2016) 
9. Maurer, M., Gerdes, J.C., Lenz, B., Winner, H.: Autonomous driving: technical, 
legal and social aspects. Springer, Heidelberg (2016) 
10. Wang, S., Heinrich, S., Wang, M., Rojas, R.: Shader-based sensor simulation for 
autonomous car testing. In: 2012 15th International IEEE Conference on Intelligent 
Transportation Systems, pp. 224–229. IEEE (2012) 
11. Simon, C., Ludwig, T., Kruse, M.: Extracting sensor models from a scene based 
simulation. In: 2016 IEEE International Conference on Multisensor Fusion and 
Integration for Intelligent Systems (MFI), pp. 259–264. IEEE (2016) 
12. Boesch, P.M., Ciari, F.: Agent-based simulation of autonomous cars. IEEE Am. 
Control Conf. (ACC) 2015, 2588–2592 (2015) 
13. Piovan, A.G.: A neural network for automatic vehicles guidance. ACE 10, 2 (2012) 
14. Gechter, F., Contet, J.-M., Galland, S., Lamotte, O., Koukam, A.: Virtual intel-ligent 
vehicle urban simulator: application to vehicle platoon evaluation. Simul. 
Modell. Pract. Theory 24, 103–114 (2012) 
15. That, T.N., Casas, J.: An integrated framework combining a tra?c simulator and 
a driving simulator. Procedia-Soc. Behav. Sci. 20, 648–655 (2011) 
16. Harrington, P.: Machine Learning in Action. Manning Publications (2012) 
17. Scikit-Learn: sklearn.metrics.precision recall fscore support. https://goo.gl/ 
4xxkGJ
Exploring the Quanti?ed Experience: Finding 
Spaces for People and Their Voices in Smarter, 
More Responsive Cities 
H. Patricia McKenna(?) 
AmbientEase and the UrbanitiesLab, Victoria, BC V8V 4Y9, Canada 
mckennaph@gmail.com 
Abstract. The objective of this paper is to explore the quanti?ed experience in 
the context of ?nding spaces for people and their voices in smarter and more 
responsive cities. Using the construct of awareness, this exploration is situated 
theoretically at the intersection of a?ective computing, social computing, and 
pervasive computing. This paper problematizes the quanti?ed experience in 
human computer interactions (HCI), arguing for smart and responsive cities to be 
enabled by more aware people interacting with and in?uencing aware technolo- 
gies. Aware people and aware technologies refer to the dynamic interweaving of 
sensing, sensors, and sensor networks through the Internet of Things (IoT), the 
Internet of People (IoP), and the Internet of Experiences. The methodology for 
this paper includes an exploratory case study approach and the research design 
incorporates multiple methods of data collection including survey and interviews. 
Findings from this work highlight the need for qualitative data using content 
analysis and other analytic techniques to augment, complement, and enhance the 
quantitative data being generated and gathered in urban spaces. This work is 
signi?cant in that it: (a) explores elements of the contemporary urban quanti?ed 
experience through the lens of awareness and the sub-constructs of adaptability 
and openness; (b) advances a framework for people-aware quanti?ed experiences 
in support of spaces for people and their voices in smarter, more responsive cities; 
and (c) further develops and innovates the research and practice literature for 
smart and responsive cities, in relation to people-aware quanti?ed experiences. 
Keywords: A?ective computing · Awareness 
Human Computer Interactions (HCI) · Internet of Experiences 
Internet of Things (IoT) · Internet of People (IoP) · Pervasive computing 
Quanti?ed experience · Responsive cities · Sensing and sensor networks 
Smart cities · Social computing 
1 Introduction 
The main objective of this paper is to explore the quanti?ed experience in the context 
of ?nding spaces for people and their voices in smarter and more responsive cities. This 
work problematizes the quanti?ed experience in human computer interactions (HCI), 
arguing for smart and responsive cities to be enabled by more aware people interacting 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 269–282, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_22
with and in?uencing aware technologies. Aware people and aware technologies refer to 
the dynamic interweaving of sensing, sensors, and sensor networks through the Internet 
of Things (IoT), the Internet of People (IoP), and the Internet of Experiences. Using the 
construct of awareness to explore the quanti?ed experience, this work is situated theo- 
retically at the intersection of a?ective computing, social computing, and pervasive 
computing. Methodologically, an exploratory case study approach is used in this work 
and the research design incorporates multiple methods of data collection including 
survey and interviews. Additional details about the methodology are provided in 
Sect. 3 of this paper. Brie?y, data were gathered from diverse individuals across multiple 
small to medium to large sized cities in several countries. Content analysis was used in 
the analysis of qualitative data and descriptive statistics in the analysis of quantitative 
data. A literature review was conducted for the Internet of Things, People, and Experi- 
ences and the complementing of quanti?ed experiences in the context of smart and 
responsive cities. The literature review enabled formulation of a theoretical perspective 
for this work. This work is signi?cant in that it: (a) explores elements of the contempo- 
rary urban quanti?ed experience through the lens of awareness and the sub-constructs 
of adaptability and openness; (b) advances a framework for people-aware quanti?ed 
experiences in support of spaces for people and their voices in smarter and more respon- 
sive cities; and (c) further develops and innovates the research and practice literature 
for smart and responsive cities, in relation to people-aware quanti?ed experiences. 
In the context of smart cities, future cities, and rapid urbanization globally, the need 
for a new urban agenda is advanced by the UN [1] that is, among other things, “people-centered 
and measurable”. Konomi and Roussos [2] observe a movement beyond the 
earlier conception of smart cities that emerged over the last decade “towards a deeper 
level of symbiosis among smart citizens, Internet of Things and ambient spaces”. Gold- 
smith and Crawford [3] advance the notion of responsive cities, leveraging digital tech- 
nologies and data analytics in combination with civic engagement and governance. In 
relation to the digital and aware technologies of sensing, sensors, and the Internet of 
Things (IoT), Hotho et al. [4] de?ne sensor using the Oxford English Dictionary, as “a 
device which detects or measures a physical property and records, indicates, or otherwise 
responds to it”. Hotho et al. [4] extend this de?nition to encompass “technological 
sensors as well as human sensors” and sensing that “relates to the psychosocial envi- 
ronment” as in “sensing danger”, as well as enabling “a higher level of integration and 
interpretation of di?erent external and internal signals”. Friberg [5] combines the notion 
of atmosphere and aesthetic education to propose an approach to the exploration of 
performing everyday practices in relation to an awareness of the sensorial and bodily in 
urban spaces. As such, the multi-sensorial capabilities of people described by Lévy [6] 
from a human geography perspective emerge as awareness, an important form of 
sensing. 
This introduction and background gives rise to the main research question under 
exploration in this work using the construct of awareness and the sub-constructs of 
adaptability and openness. 
Q1: How and why do people ?gure strongly in the making of more aware, adaptive, and open 
analytic spaces to complement existing approaches to quanti?ed experience in contemporary 
urban environments? 
270 H. P. McKenna
In summary, the primary purpose of this paper is to explore, innovate, and extend 
spaces for theoretical and practical debate for quanti?ed experiences in ways that involve 
people more directly, knowingly, and creatively. What follows is the development of a 
theoretical perspective for this work in the formulation of a conceptual framework for 
more people-aware quanti?ed experiences. The framework will then be operationalized 
for use in this work using quantitative data complemented with qualitative data. The 
methodology for this work is described and the ?ndings are presented along with an 
analysis and discussion. The limitations and mitigations of the work are discussed and 
future directions are identi?ed, followed by the conclusion. 
2 Theoretical Perspective 
A review of the research literature was conducted for smart and responsive cities; the 
Internet of Things, the Internet of People, and the Internet of Experiences; and oppor- 
tunities for complementing the quanti?ed experience. This theoretical perspective forms 
the basis for the formulation of a conceptual framework for more people-aware quan- 
ti?ed experiences. 
2.1 Smart and Responsive Cities 
Townsend [7] describes smart cities as “places where information technology is 
combined with infrastructure, architecture, everyday objects and even our bodies, to 
address social, economic, and environmental problems”. Kyriazopoulou [8] provides a 
literature review of architectures and requirements for the development of smart cities, 
highlighting the sectors identi?ed by Gi?nger et al. [9] of smart economy, people, 
governance, mobility, environment, and living as the focus for improvement. According 
to Kyriazopoulou [8], “o?ering citizens a great experience” is a primary goal of smart 
cities. Gil-Garcia et al. [10] identify 14 dimensions in conceptualizing smartness in 
government such as citizen engagement, openness, creativity, technology savvy, and 
resilience, to name a few. According to Gil-Garcia et al. [10] citizen engagement “allows 
two-way communication and enables collaboration and participation, fostering stronger 
and more intelligent relationships” while resilience contributes to the ability to “adapt 
to change”. Khatoun and Zeadally [11] provide a smart city model consisting of the 
Internet of Things (IoT), the Internet of Services (IoS), the Internet of Data (IoD), and 
the Internet of People (IoP) where the IoP highlights smart living and smart people. 
2.2 Internet of Things, People, and Experiences 
Herzberg [12] describes the Internet of Things (IoT) as “a network that enables physical 
objects to collect and exchange data” while describing the Internet of Everything as “a 
future wherein devices, appliances, people, and process are connected via the global 
Internet”. Vilarinho et al. [13] describe the use of activity feeds in social computing as 
a uni?ed communication mechanism for connecting the IoT with the IoP. Li [14] main- 
tains that the IoP “refers to digital connectivity of people through the Internet 
Exploring the Quanti?ed Experience: Finding Spaces for People 271
infrastructure forming a network of collective intelligence and stimulating interactive 
communication among people”. An infrastructure is proposed by Miranda et al. [15] in 
support of “moving from the Internet of Things to the Internet of People” where “smart- 
phones play a central role, re?ecting their current use as the main interface connecting 
people to the Internet”. According to Miranda et al. [15] key principles of the IoP include: 
social, personalized, proactive, and predictable. Indeed, Miranda et al. [15] employs the 
IoP concept to draw “the IoT closer to people, for them to easily integrate into it and 
fully exploit its bene?ts.” Conti et al. [16] argue for “a radically new Internet paradigm” 
in the form of “the Internet of People (IoP)” in which people move beyond “end users 
of applications” to “become active elements of the Internet.” McKenna [17] explored 
the experience of contemporary city environments through urban edges, surfaces, 
spaces, and the in-between in an e?ort to “complement, extend, and enrich algorithmic 
and network views.” Wellsandt et al. [18] describe the Internet of Experiences (IoE) in 
terms of an experience-centered approach “to complement human-centered innovation 
with experiences from arti?cial systems.” 
2.3 Complementing Quanti?ed Experiences 
The United Nations [1] notes that, “urban space is being reimagined” while Casini [19] 
calls for smart city initiatives to move beyond a focus on “individual areas” toward a 
more “integrated approach” taking advantage of “new enabling infrastructures” in 
combination with sensor technologies. In this way, cities are encouraged to build upon 
existing structures in “exploiting synergies and interoperability between systems to 
deliver added value services for citizens to improve their quality of life” [19]. Falcon 
and Hamamoto [20] claim that the mass amounts of data being generated in everyday 
life “through the Web” and “on city streets” are opening the way for “bodies of data 
together with algorithms” that “will shape who we think we are” and “who we will 
become.” As mentioned earlier, Gil-Garcia et al. [10] identify creativity and openness 
as two of 14 key drivers for conceptualizing smartness in government. It is worth noting 
that, according to Amabile [21], a component of creativity is the open-endedness or 
heuristic dimension as distinct from “having a single, obvious solution (purely algo- 
rithmic).” And Dourish [22] points out that, “our experience of algorithms can change 
as infrastructure changes.” 
McKenna et al. [23] explored the potential for the assessment of creativity through 
an adaptation of the Consensual Assessment Technique (CAT) for use in technology-pervasive 
learning environments. Using a social radio application as an example of a 
social media space, McKenna et al. [23] explored environments “characterized by 
awareness, autonomy, collaboration, and real time data analytics potential.” McKenna 
and Chauncey [24] introduced the CAT into library, information, and learning spaces, 
proposing the technique be adapted to accommodate the assessment of creativity, inno- 
vation, and value in everyday, in-the-moment activities. As such, the CAT was explored 
[24] in terms of involving people more directly and knowingly in new partnering and 
collaborative opportunities in relation to data and learning analytics. By extension, this 
current work proposes the consideration of similar techniques for more meaningfully 
and directly involving people in the analysis and assessment of quanti?ed experiences 
272 H. P. McKenna
in the context of smarter and more responsive cities. Indeed, Baumer [25] proposes a 
human-centered algorithm design (HCAD) to address gaps or disconnects between 
algorithm metrics focused on performance on the one hand and concerns with incorpo- 
rating “human and social interpretations” on the other. In making algorithmic design 
more people centered, Baumer [25] identi?es three approaches focused on the theoret- 
ical, speculative, and participatory. McKenna [26] explores “the three key enrichment 
mechanisms of awareness, creativity, and serendipity in the context of the IoT and the 
IoP” pointing to “the potential for a shift to occur” possibly opening new spaces “for 
the combining of algorithmic and heuristic activities” and the evolving of “algorithmic/ 
heuristic relationships in smart cities.” 
2.4 Conceptualizing People-Aware Quanti?ed Experiences 
This theoretical background enables formulation of a conceptual framework for more 
people-aware quanti?ed experiences. As depicted in Fig. 1, the people-technologies-cities 
dynamic in public spaces, utilizes a combination of the Internet of Things (IoT), 
the Internet of People (IoP), and the Internet of Experiences (IoE), combining aware 
people and aware technologies, in the form of responsive, engaging, and evolving 
mechanisms and approaches contributing to greater awareness, adaptability, and open- 
ness for fostering future technology spaces with potentials for developing and accom- 
modating people-aware quanti?ed experiences. 
Fig. 1. Conceptual framework for people-aware quanti?ed experiences. 
The research question (Q1) identi?ed in Sect. 1 of this work is reformulated as a 
proposition for exploration in this paper, as follows 
P1: People and their multi-sensorial capabilities, in combination with aware technologies, enable 
the enhancing of sensing, sensors, and the Internet of Things, People, and Experiences contri- 
buting to greater awareness, adaptability, and openness in support of greater potentials for more 
creative and people-aware analytic spaces to complement existing approaches to quanti?ed 
experience in contemporary urban environments. 
Exploring the Quanti?ed Experience: Finding Spaces for People 273
3 Methodology 
An emergent, exploratory case study approach was used for this work, said to be partic- 
ularly appropriate for the study of contemporary phenomena [27]. Contemporary urban 
environments constituted the case for this study. In Sects. 3.1–3.3 a description of the 
process followed for this study is provided, the sources of evidence, and the data analysis 
techniques used. 
3.1 Process 
A website was used to describe the study, invite participation, and enable sign up. 
Demographic data were gathered during registration for the study including location, 
age range, and gender. People were able to self-identify in one or more categories (e.g., 
educator, learner, community member, city o?cial, business, etc.). Registrants were 
invited to complete a survey containing 20 questions as an opportunity to think about 
smart cities in relation to awareness, adaptability, and openness for improved livability. 
In-depth interviews with participants enabled discussion of urban experiences and ideas 
about smart cities. A pre-tested survey instrument was used for this study as well as a 
pre-tested interview protocol. 
3.2 Sources of Evidence 
This study attracted international interest with participants located mostly in small to 
medium to large sized cities in Canada (e.g., St. John’s, Ottawa, Greater Victoria), 
extending also to other countries such as Israel (e.g., Tel Aviv). Survey responses 
provided the main source of quantitative data for this study while interview data provided 
qualitative evidence for this study along with data provided in response to open-ended 
survey questions. Three questions common to both the survey instrument and interview 
protocol were adapted from Anderson’s [28] body insight scale (BIS), as a mechanism 
for exploring the human-centered sensing of cities as a form of awareness. By contrast, 
other scales such as that by Teixiera et al. [29] pertain to human sensing using computing 
technologies for the detection of elements such as presence, count, location, track, and 
identity. More appropriate for this study, the BIS scale was designed for “assessing 
subtle human qualities” and this body insight scale [28], formerly the body intelligence 
scale [28], consists of three subscales—energy body awareness (E-BAS); comfort body 
awareness (C-BAS); and inner body awareness (I-BAS). Anderson encourages use of 
the scale in other domains and as such, the BIS is explored in this work in relation to 
people and their experience of everyday urban environments. Also of note is the impor- 
tance of feeling and a?ect in human computer interactions where emotion is considered 
to be “a critical element of design for human experience” [30], applicable here in the 
context of smart and responsive cities. The three questions adapted for use in this work 
correspond to each of the BIS sub-scales and are slightly altered in terms of wording, 
as follows: 
274 H. P. McKenna
1. Regarding your body awareness in your city, would you agree that your body lets 
you know when your environment is safe (On a scale of 1 to 5 on a continuum of 
disagree to agree)? 
2. Regarding your comfort body awareness in the world, would you agree that you 
feel comfortable in the world most of the time (On a scale of 1 to 5)? 
3. Regarding your inner body awareness in your city, would you agree that you can 
feel your body tighten up when you are angry (On a scale of 1 to 5)? 
In parallel with this study, evidence was also gathered through individual and group 
discussions with people from diverse sectors across multiple cities (e.g., Toronto, 
Vancouver, and Greater Victoria). Perspectives across the city emerged from those in 
business (architectural design, ecology, energy, information technology (IT), tourism), 
government (city councilors, policy makers, IT sta?), educators (secondary and post-secondary, 
researchers, IT sta?), students (post-secondary – engineering/design/ 
computing/education/media), and community members (IT professionals, urban 
engagement leaders, urban designers, and policy in?uencers). 
3.3 Data Analysis 
Qualitative data were analyzed using the content analysis technique involving inductive 
analysis to identify emerging terms from the data collected while deductive analysis 
enabled the identi?cation of terms emerging from the review of the research literature. 
Data were then analyzed for patterns and emergent insights. Descriptive statistics were 
used in the analysis of quantitative data. Qualitative evidence gathered from discussions 
in parallel with this study supported further analysis, comparison, and triangulation of 
data, contributing further insight and rigor. 
Overall, data were analyzed for an n = 61 spanning the age ranges of people in their 
20s to their 70s, consisting of 39% females and 61% males. 
4 Findings 
The ?ndings of this paper are presented in terms of the main construct of awareness with 
attention given to the sub-constructs of adaptability and openness in terms of the prop- 
osition explored in this work, in response to the research question. 
4.1 Awareness 
Regarding technology awareness, City IT sta? described the IoT as “more about the 
instrumentation of things, with everything connected and communicated”. A community 
member in St. John’s observed that “we’re not smart about how we use the technology”. 
A student noted the pervasive sharing of “very traditional things” and events in daily 
lives where people are “all videoing them, sharing them constantly in social media,” 
described as “a seamless behavior” contributing to a “seamless interrelationship” of the 
“local and global” generating “concurrent awareness.” 
Exploring the Quanti?ed Experience: Finding Spaces for People 275
Based on questions adapted for the city in this study from the body insight scale 
(BIS), an emerging example of a people-aware quanti?ed experience is presented in 
Table 1. During the 2015 to 2016 phase of this study an abbreviated version of Ander- 
son’s 5-point scale was used to assess urban awareness in relation to the energy body 
and feeling safe; the comfort body; and the inner body and feelings of tightness when 
angry. Responses from individuals show feelings of safety at the upper end of the scale 
with 67% at position 4 and 33% at position 5. Feelings of comfort in the world tend 
toward the high end of the scale with 67% at position 5 and 33% at the neutral position 
of 3. Feelings of tightness related to anger are spread equally at 33% across the neutral 
position of 3 and the upper end of the scale at positions 4 and 5. 
Table 1. Awareness in the city – body insight scale (2015/2016) 
Awareness 1 2 3 4 5 
Energy body: feeling safe 67% 33% 
Comfort body: in the world 33% 67% 
Inner body: tightens when angry 33% 33% 33% 
In discussions with respondents about the BIS questions, it was suggested that the 
term “world” contributed to confusion when assessing levels of comfort in a particular 
city. Based on this use experience, it was suggested that the phrase “the world” be 
replaced with “your city.” The 5-point scale was also found to be too restrictive and it 
was suggested that the scale be extended from 5 to 7 points. 
Table 2. Awareness in the city – body insight scale (2016/2018) 
Awareness 1 2 3 4 5 6 7 
Energy body: feeling safe 33% 67% 
Comfort body 67% 33% 
Inner body: tightens when angry 33% 33% 33% 
Guided by feedback from respondents in 2015 to 2016, wording and scale adapta- 
tions were pre-tested and approved for use in this study from 2016 going forward. This 
enriched and emerging example of a people-aware quanti?ed experience is presented in 
Table 2. Survey responses from individuals show that feelings of safety continue to 
emerge at the upper end of the scale in position 7 (67%), with people indicating that their 
body lets them know when their environment is safe. However, 33% responded at the 
much lower end of the scale at position 2. During interviews it was possible to discuss 
the scale rating choices to learn more about underlying factors. Open-ended survey 
responses also provided additional insight. For example, in the case of those residing 
outside the city or urban area, the response rate drops sharply toward the lower end of 
the scale (33%) for feelings of safety during experiences of visiting the city. Regarding 
comfort levels in the city, responses varied from the high end at one extreme at position 
7 (33%) to an increased concentration appearing at the much lower position of 3 on the 
scale (67%). Where urban comfort levels tended toward the higher end of the scale in 
cities in 2015 to 2016, comfort levels shifted noticeably in 2016 to 2018 in cities toward 
276 H. P. McKenna
the lower end of the scale. In part, comfort was in?uenced by urban design elements, 
such as the placement of benches. Feelings of tension in the city, such as anger, appearing 
in Table 1 (33% at the 3, 4, and 5 positions) seem to remain relatively consistent with 
those emerging in Table 2, tending toward the mid to higher positions of the scale with 
33% at the 4, 5, and 6 positions. During interviews it was reported that feelings of tense- 
ness and anger depended upon the city where, in a smaller scale city, the inability to 
?nd a parking spot may contribute to anger, while in a much larger urban center such 
as London, being tense “would be normal” pointing to “a di?erence in how you carry 
yourself” depending on the city. 
4.2 Adaptability 
Mechanisms and approaches to accommodate new forms of adaptability in urban inter- 
actions emerged in a variety of ways. For example, an educator in Vancouver described 
the importance of people coming together in the city where “the meeting becomes the 
technology that changes everything.” A building designer noted that, “people want to 
be able to interact and really be in an overall environment” calling for changes in urban 
design. A community organizer in Victoria observed how City Council members “go 
where the citizens are” when there is “an opportunity for public engagement.” In the 
case of wanting “to reengage with our bylaws about growing food on city land,” Council 
members and/or city sta? will attend “city events” rather than “just posting something 
on their website” as “a really e?ective way to engage the community.” From a creativity 
perspective, a community leader articulated the need to ?gure out how to “move away 
from sector driven strategies to ones that” feature “clusters” so as to “bring industries 
and sectors together rather than that sort of silo” approach. Cross-sector initiatives were 
identi?ed related to “connected cities,” while recognizing the potential for, and impor- 
tance of, funding for smart cities. 
4.3 Openness 
City IT sta? commented that “fundamentally there is a desire to be very, very open with 
the available data” as public data. It was noted that “the other element we’re trying to 
share is even just the processes of City Hall” using the example of permit applications. 
A locally developed mobile app was described by an educator in terms of the capability 
of being “able to open this kind of feedback” potential to anyone in the city as a way 
“to transform contributions both in terms of unique ideas and patterns into the design of 
some urban space or buildings” as in “smart infrastructure.” A building designer 
described the focus on creating a “whole urban space” enabling a coming together of 
people so as “to make it feel like its not this closed in community.” The designer 
suggested the potential for “having buildings or alleyways” serve as “more than just that 
intended use” so as to become multi-use and multi-purpose spaces. A community leader 
suggested that, “one of the challenges that the building community faces in doing these 
things is ?nancial.” Reference was made to the importance of planning for “an open 
innovation event” designed to be “more engaging” inviting proposals to “pilot ideas” to 
address urban challenges going forward. Regarding social media and openness, a student 
Exploring the Quanti?ed Experience: Finding Spaces for People 277
questioned the veracity of information provided to platforms, pointing to the frequent 
contributing of “made up” details in an e?ort to maintain some degree of privacy. 
Explored quantitatively, as illustrated in Table 3, when asked to assess the extent to 
which openness is associated with smart cities on a 7 point scale (1 – Not at all, 2 – Not 
sure, 3 – Maybe, 4 – Neutral, 5 – Sort of, 6 – Sure, 7 – Absolutely) the majority of 
responses emerge toward the upper end of the scale with 33% at positions 6 and 7 along 
with a 33% response at the neutral position of 4. 
Table 3. Openness and smart cities – assessments 
Smart cities 1 2 3 4 5 6 7 
Openness 33% 33% 33% 
Exploring quantitatively the potentials for attuning, sharing, and trust, people were 
asked to assess these elements in relation to city-focused social media and other aware 
technologies on a scale of 1 to 7 (not at all to absolutely). As illustrated in Table 4, 
assessments of attuning to urban spaces tended toward the upper end of the scale with 
33% at the 6 position and 67% at position 7. Again, sharing is strong with 67% at the 
upper end of the scale in position 7 and 33% in position 6. Trust emerges toward the 
upper end of the scale with 67% of responses at the 5 position and 33% at 7. 
Table 4. Attuning, sharing, and trust – assessments 
Smart cities 1 2 3 4 5 6 7 
Attuning 33% 67% 
Sharing 33% 67% 
Trust 67% 33% 
A summary of ?ndings is presented in Table 5 in terms of the three constructs of 
awareness, adaptability, and openness in relation to the technologies of the Internet of 
Things (IoT), the Internet of People (IoP), and the Internet of Experiences (IoE). IoT 
technologies emerge in relation to awareness as instrumented, as meeting spaces for 
adaptability, and as mobile apps for openness. IoP technologies highlight awareness in 
relation to seamless behaviour, as clusters for adaptability, and as piloting ideas across 
diverse sectors for openness. IoE technologies contribute to multi-dimensional aware- 
ness, connected cities for adaptability, and to calls for attention to the veracity of data 
in social media and other online platforms in relation to openness and associated 
concerns with privacy in urban spaces. 
Table 5. Summary of ?ndings 
Tech Awareness Adaptability Openness 
IoT Instrumented Meeting spaces Mobile app 
IoP Seamless behavior Clusters Piloting ideas 
IoE Multi-dimensional Connected cities Veracity/privacy 
278 H. P. McKenna
5 Discussion 
Awareness-based ?ndings suggest an instrumented, technology perspective from infor- 
mation technology professionals balanced by community member voices highlighting 
the importance of being “smart about how we use the technology.” The seamless inter- 
mingling of the IoT-IoP-IoE emerges in the observations of a student articulating the 
“concurrent awareness” of the local and the global. The nature of pervasive sharing 
described in the ?ndings, enriches the quantitative details provided in Table 4 for 
attuning and sharing. Trust level assessments in Table 4, while relatively strong, suggest 
an underlying tentativeness with 67% at position 5 and 33% at the upper end of the scale 
at 7, when compared with responses for attuning and sharing. The multi-dimensionality 
of the urban experience is highlighted through early-stage use of the body insight scale 
(BIS) to explore feelings of safety, comfort, and tension levels more directly with people. 
Early indications of factors in?uencing responses to use of the BIS pertain to city size, 
urban design elements, familiarity with the city, and other emerging and evolving aspects 
of cities and city regions that may include density (e.g., increasing urbanization over 
time) and geographic location. Adaptability-related ?ndings emphasize the importance 
of ?guring out e?ective ways to bring people together – meetings, clusters, technologies 
– in support of more community focused approaches to engagement and governance for 
connected cities. Openness-related ?ndings pertained to the use of an urban app for more 
inclusive use as smart infrastructure; the piloting of ideas in developing designs for 
greater connection in multi-use urban spaces; and the veracity of social media and other 
platform data in the face of underlying privacy concerns, shedding light on Table 3 and 
quantitative assessments of openness, with implications for quanti?ed experiences. 
6 Future Directions 
Findings from this work highlight the need for qualitative data to augment, complement, 
and enhance the quantitative data being generated and gathered in urban spaces. Issues 
related to the veracity of large amounts of data providing the basis for algorithmic activ- 
ities gives rise to concerns identi?ed here with “made up” details and the resulting e?ect 
on algorithmic accuracy. As such, this work points to new pathways for the involvement 
of people more meaningfully and directly in the creation of spaces, both in theory and 
practice, for interaction in algorithmic realms. Such spaces will contribute to the shaping 
of debates, algorithmic designs, and new possibilities and potentials for more creative 
outcomes in the innovating of quanti?ed experiences as more people-aware. 
7 Challenges, Limitations, and Mitigations 
Limitations of this work related to small sample size are mitigated by in-depth and rich 
detail from a wide range of individuals across small to medium to large urban centers. 
Challenges related to geographic location are mitigated by the potential to extend this 
work to other cities, including megacities and regions exceeding 10 million people. The 
challenge of studying emergent, dynamic, and evolving understandings of smart cities 
Exploring the Quanti?ed Experience: Finding Spaces for People 279
through awareness, adaptability and openness is mitigated by opportunities to explore 
the making of openings and spaces for innovative opportunities going forward for 
quanti?ed experiences. While only a limited number of possible body insight scale (BIS) 
questions were adapted for exploration in this work, opportunities exist for further vali- 
dation of these questions for use in urban environments going forward and for the inclu- 
sion of additional questions. 
8 Conclusion 
This paper provides an exploration of the evolving area of aware people and aware 
technologies in relation to quanti?ed experiences in smart cities. Key contributions of 
this work include: (a) the use of awareness, adaptability, and openness in relation to the 
Internet of Things (IoT), the Internet of People (IoP), and the Internet of Experiences 
(IoE) as aspects of smart cities, in exploring the potential for innovating quanti?ed 
experiences; (b) formulation of a conceptual framework for people-aware quanti?ed 
experiences; (c) early-stage exploration of adaptations to the body insight scale (BIS) 
for use in the study of quanti?ed experiences in contemporary urban environments; and 
(d) further development of the smart cities research and practice literature in relation to 
innovations for quanti?ed experiences. A major take away from this work is the critical 
importance of aware people in combination with aware technologies in fostering new 
potentials for the making of innovative spaces to accommodate people more meaning- 
fully and directly in the algorithmic realm in smart cities. This work will be of interest 
to technology developers, researchers, research think tanks, urban practitioners, 
community members, and anyone concerned with more creative and innovative quan- 
ti?ed experience initiatives for future tech, smarter cities, and more responsive cities. 
References 
1. Habitat, U.N.: Urbanization and Development: Emerging Futures—World Cities Report 
2016. UN Habitat, Nairobi (2016) 
2. Konomi, S., Roussos, G.: Enriching Urban Spaces with Ambient Computing, the Internet of 
Things, and Smart City Design. IGI Global, Hershey (2017) 
3. Goldsmith, S., Crawford, S.: The Responsive City: Engaging Communities Through Data-
Smart Governance. Jossey-Bass, San Francisco (2014) 
4. Hotho, A., Stumme, G., Theunis, J.: Introduction: new ICT-mediated sensing opportunities. 
In: Loreto, V., Haklay, M., Hotho, A., Servedio, V.D.P., Stumme, G., Theunis, J., Tria, F. 
(eds.) Participatory Sensing, Opinions and Collective Awareness, pp. 3–8. Springer, Cham 
(2017) 
5. Friberg, C.: Performing everyday practices: atmosphere and aesthetic education. Ambiances 
Int. J. Sens. Environ. Archit. Space Var. 464, 1–12 (2014) 
6. Lévy, J. (ed.): The City: Critical Essays in Human Geography. Contemporary Foundations 
of Space and Place Series. Routledge, London (2016) 
7. Townsend, A.M.: Smart Cities: Big Data, Civic Hackers and the Quest for a New Utopia. 
WW Norton, New York (2013) 
280 H. P. McKenna
8. Kyriazopoulou, C.: Architectures and requirements for the development of smart cities: a 
literature study. In: Elfhert, M., et al. (eds.) Smartgreens 2015 and Vehits 2015, CCIS 579, 
pp. 75–103. Springer, Cham (2015) 
9. Gi?nger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., Meijers, E.: Smart 
Cities: Ranking of European Medium-Sized Cities. University of Technology, Vienna (2007) 
10. Gil-Garcia, J.R., Puron-Cid, G., Zhang, J.: Conceptualizing smartness in government: an 
integrative and multi-dimensional view. Gov. Inf. Q. 33(3), 524–534 (2016) 
11. Khatoun, R., Zeadally, S.: Smart cities: concepts, architectures, research opportunities. 
Commun. ACM 59(8), 46–57 (2016) 
12. Herzberg, C.: Smart Cities, Digital Nations: How Digital Urban Infrastructure can Deliver a 
Better Life in Tomorrow’s Crowded World. Roundtree Press, Petaluma (2017) 
13. Vilarinho, T., Farshchian, B.A., Floch, J., Mathisen, B.M.: A communication framework for 
the Internet of People and Things based on the concept of activity feeds in social computing. 
In: Proceedings of the 9th International Conference on Intelligent Environments, pp. 1–8 
(2013) 
14. Li, M.: Editorial: Internet of People. Concurr. Comput. Pract. Exp. 29, 1–3 (2017) 
15. Miranda, J., Mäkitalo, N., Garcia-Alonso, J., Berrocal, J., Mikkonen, T., Canal, C., Murillo, 
J.M.: From the Internet of Things to the Internet of People. IEEE Internet Comput. 19(2), 40– 
47 (2015) 
16. Conti, M., Passarella, A., Das, S.K.: The Internet of People (IoP): a new wave in pervasive 
mobile computing. Pervasive Mob. Comput. 41, 1–27 (2017) 
17. McKenna, H.P.: Edges, surfaces, and spaces of action in 21st century urban environments— 
connectivities and awareness in the city. In: Kreps, D., Fletcher, G., Gri?ths, M. (eds.) 
Technology and Intimacy: Choice or Coercion, Advances in Information and Communication 
Technology, vol. 474, pp. 328–343. Springer, Cham (2016) 
18. Wellsandt, S., Wuest, T., Durugb, C., Thoben, K.D.: The Internet of Experiences—towards 
an experience-centred innovation approach. In: Emmanouilidis, C., Taisch, M., Kiritsis, D. 
(eds.) Advances in Production Management Systems, Competitive Manufacturing for 
Innovative Products and Services, APMS 2012. IFIP Advances in Information and 
Communication Technology, vol. 397, pp. 669–676. Springer, Berlin (2013) 
19. Casini, M.: Green technology for smart cities. In: IOP Conference Series: Earth and 
Environmental Science, vol. 83, p. 012014, 2nd International Conference on Green Energy 
Technology, pp. 1–10 (2017) 
20. Falcon, R., Hamamoto, B.: Bodies of Data: Who are We Through the Eyes of Algorithms. 
Future Now. Institute For The Future (IFTF), Palo Alto (2017) 
21. Amabile, T.M.: Componential theory of creativity. In: Kessler, E.H. (ed.) Encyclopedia of 
Management Theory. Sage, Los Angeles (2013) 
22. Dourish, P.: Algorithms and their others: algorithmic culture in context. In: Big Data and 
Society, pp. 1–11 (2016) 
23. McKenna, H.P., Arnone, M.P., Kaarst-Brown, M.L., McKnight, L.W., Chauncey, S.A.: 
Application of the consensual assessment technique in 21st century technology-pervasive 
learning environments. In: Proceedings of the 6th International Conference of Education, 
Research and Innovation (iCERi2013), pp. 6410–6419 (2013) 
24. McKenna, H.P., Chauncey, S.A.: Exploring a creativity assessment technique for use in 21st 
century learning, library, and instructional collaborations. In: Proceedings of the 8th 
International Conference of Education, Research and Innovation (iCERi), pp. 5371–5380 
(2015) 
25. Baumer, E.P.S.: Toward Human-Centered Algorithm Design. In: Big Data & Society, pp. 1– 
12 (2017) 
Exploring the Quanti?ed Experience: Finding Spaces for People 281
26. McKenna, H.P.: Creativity and ambient urbanizing at the intersection of the Internet of Things 
and People in smart cities. In: Universal Access in Human–Computer Interaction, Virtual, 
Augmented, and Intelligent Environments. Lecture Notes in Computer Science, vol. 10908. 
Springer, Cham (2018) 
27. Yin, R.K.: Case Study Research and Applications: Design and Methods. Sage, Los Angeles 
(2018) 
28. Anderson, R.: Body Intelligence Scale: de?ning and measuring the intelligence of the body. 
Hum. Psychol. 34(4), 357–367 (2006) 
29. Teixiera, T., Dublon, G., Savvides, A.: A survey of human-sensing: methods for detecting 
presence, count, location, track, and identify. ENALAB Technical Report 09-2010, vol. 1, 
no. 1 (2010) 
30. Hanington, B.: Design and emotional experience: introduction. In: Jeon, M. (ed.) Emotions 
and A?ect in Human Factors and Human–Computer Interaction, pp. 165–183. Elsevier, 
London (2017) 
282 H. P. McKenna
Prediction of Tra?c-Violation Using Data 
Mining Techniques 
Md Amiruzzaman(B) 
Kent State University, Kent, OH 44242, USA 
mamiruzz@kent.edu 
Abstract. This paper presents the prediction of tra?c-violations using 
data mining techniques, more speci?cally, when most likely a tra?c-violation 
may happen. Also, the contributing factors that may cause 
more damages (e.g., personal injury, property damage, etc.) are discussed 
in this paper. The national database for tra?c-violation was considered 
for the mining and analyzed results indicated that a few speci?c times 
are probable for tra?c-violations. Moreover, most accidents happened 
on speci?c days and times. The ?ndings of this work could help prevent 
some tra?c-violations or reduce the chance of occurrence. These results 
can be used to increase cautions and tra?c-safety tips. 
Keywords: 
Tra?c·
Prediction 
·
Crime 
·
Violations 
·
Data mining 
1 Introduction 
According to [1], approximate population of US is 326,200,000, and there are 
196,000,000 licensed drivers [2]. However, based on the data presented in [2], 
every day in average of 112,000 tickets are issued for di?erent types of tra?c-violations 
(mainly speeding). Altogether, approximately 41,000,000 tickets are 
issued every year (see Table 1). The statistics provides an overview of the tra?c-violations 
in the US, and there are number of reasons that causes tra?c-violations. 
As the number of vehicles are increasing every day, so does the 
chance of tra?c-violations [3,4]. Often, tra?c-violations lead to road accidents 
and injuries (Chen et al. 2004; Nath 2006). 
Chen et al. in [3] classi?ed di?erent types of crime at di?erent law-enforcement 
level. Such as, sex crime in law-enforcement level two, and theft 
(e.g., robbery, burglary, larceny, etc.) in law-enforcement level three. In their 
classi?cation, tra?c-violation is one of the common local crimes [3]. In general, 
bad weather, unskilled drivers, drunk drivers, and drivers who pay less attention 
while driving may cause tra?c-violations, as well as road accidents. However, 
there may be some other contributing reasons that may lead to tra?c-violations 
and road accidents. For example, speeding, reckless driving, driving under in?u-ence 
of drugs or alcohol, hit-and-run, road rage, etc. The research [3] mainly 
focused on crimes and who is committing them, rather than tra?c-violations. 
.s
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 283–297, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_23
284 Md. Amiruzzaman 
Table 1. Tra?c-violation statistics 
Driving citation statistics 
Average number of people per day that receive a speeding ticket 112,000 
Total annual number of people who receive speeding tickets 41,000,000 
Total percentage of drivers that will get a speeding ticket this year 20.6 % 
Solomon et al. (2006) analyzed tra?c-violation data to develop tra?c safety 
program [4]. Their research focused on identifying places where tra?c-violations 
occurred and how to better monitor those places. Solomon et al. (2006) proposed 
to use more camera/surveillance to monitor those identi?ed high tra?c-violation 
places and use those surveillance footages to identify responsible parties [4]. This 
research [4] helped to improve tra?c-safety programs. 
In a separate study, Saran and Sreelekha (2015) found correlations between 
drunk driver, careless driving, over the speed limit and road accidents [5]. How-ever, 
these ?ndings are not something new to the law-enforcement agencies and 
research communities. Moreover, [5] mainly focused on statistical analysis (i.e., 
correlation analysis) and surveillance. In their paper, Saran and Sreelekha [5] 
used Arti?cial Neural Network (ANN) for vehicle detection. They also focused 
on Intelligent Transport System (ITS), which incorporate latest computer tech-nologies 
and computer vision [5]. Saran and Sreelekha (2015) indicated that 
ANN is superior in classi?cying moving vehicles than Support Vector Machine 
(SVN) and k-nearest neighbor (k-nn) algorithms. Note that, SVN and k-nn are 
two most popular algorithms that are widely used in data mining. 
Gupta, Mohammad, Syed and Halgamuge (2016) found a correlation between 
crime rates and accidents from Denver city of Colorado state [6]. Note that 
tra?c-violations may lead to violent crimes as well. For example, drunk driver 
may cause some property damage or injury to others. From their mining research, 
Gupta et al. (2016) were able to predict that in the months of January and 
February, most crimes are likely to occur. These ?ndings were helpful to the law-enforcement 
agencies (Gupta et al. 2016). The major drawback of [6] research 
is that authors only focused on one speci?c city of a state. Analyzing national 
database is necessary to understand how tra?c-violations occurring in the US. 
Nath (2006) indicated that most criminals along with other crimes, com-mitted 
tra?c-violation crimes as well [7]. One of the interesting ?ndings from 
Nath (2008) was to claim that 10% criminals commits 50% of the crimes. Chen 
et al. (2004) mentioned that a tra?c-violation is a primary concern for city, 
county, and state level law-enforcement agencies. In [7], authors mainly focused 
on where and how many Closed-Circuit Television (CCTV) would be helpful to 
?nd responsible parties. 
The purpose of this study is to predict tra?c-violations based on previous 
incidents. The national database for tra?c violations is to be examined to deter-mine 
any factors that contributed to previous tra?c-violations and developed 
the prediction. Also, what time and days are most violations occur will be deter-mined 
using the mining as well.
Prediction of Tra?c-Violation Using Data Mining Techniques 285 
The rest of this paper is organized as follows: Sect. 2 describes existing litera-tures. 
Section 3 describes the method used in this study and Sect. 4 summarizes 
the experimental results. Section 5 presents discussion about the experimental 
results and Sect. 6 concludes the paper with implications and future works. 
2 Literature Review 
Chen et al. (2004) studied di?erent types of crime, such as tra?c-violations, 
sex crime, theft, fraud, arson, gang/drug o?enses, violent crime, and cybercrime 
[3]. Also, they classi?ed these crime types to di?erent law-enforcement levels 
(e.g., level one, level two, etc.). Chen et al. (2004) identi?ed tra?c-violations 
as level one crime and one of the common local crimes [3]. They mentioned 
that speeding, reckless driving, causing property damage or personal injury in a 
collision, driving under in?uence of drugs or alcohol, hit-and-run, and road rage 
are common reasons for tra?c-violations [3]. According to Chen et al. (2004), 
tra?c-violations mostly considered as less harmful crime, however, sometimes 
this type of crime could cause severe bodily injury or property damage [3]. Even 
though, Chen et al. [3] discussed about tra?c-violation and other crimes, but 
their work actually did not focus on tra?c-violation analysis. Rather, their work 
focused on other types of crime analysis and prediction of those crimes to help 
law-enforcement agencies. 
Solomon, Nguyen, Liebowitz and Agresti (2006) demonstrated how to use 
data mining (DM) and evaluate cameras that monitor red-light-signals in traf-?c 
intersections [4]. Based on their ?ndings they proposed some techniques to 
improve tra?c safety programs. In their work, they used di?erent modeling 
techniques, such as decision trees, neural networks, market-basket analysis, and 
k-means. Solomon et al. (2006) focused on identifying places where red-light-signal 
violations occurred and how to better monitor those places. The red-light 
violation is known as red light running (RLR), and according to the Federal 
Highway Administration (FHWA), approximately 1,000 Americans were killed 
and 176,000 were injured in 2003 because of RLR. 
To describe the severity of RLR and its damage on the economy, Solomon 
et al. (2006) in [4] wrote, “The California Highway Patrol estimates that each 
RLR fatality costs the United States $2,600,000 and other RLR crashes cost 
between $2,000 and $183,000, depending on severity (California State Auditor, 
2002)” (p. 621). As for the recommendation, they proposed to use more cam-era/
surveillance to monitor those identi?ed high tra?c-violation places and use 
those surveillance footages to identify responsible parties. As for their data, they 
used tra?c-violation data from Washington, DC area; the data was collected 
between the year 2000 and 2003 (Solomon et al. 2006). In terms of ?ndings, 
their [4] work helped law-enforcement agencies to ?nd responsible parties using 
the red light camera (RLC). However, placing RLCs in a right place is not an 
easy task. Data mining technique can be helpful to determine the high accident 
zone and place RLCs in appropriate locations. 
In a separate study [5], Saran and Sreelekha (2015) found correlations 
between drunk driver, careless driving, over the speed limit and road accidents.
286 Md. Amiruzzaman 
However, these ?ndings are not something new to the law-enforcement agencies 
and to the research communities [5]. Their work [5] was more of a classi?ca-tion 
than data mining. They used videos obtained from closed circuit television 
(CCTV) cameras placed in roadsides or driveways are used for the surveillance. 
They used arti?cial neural networks (ANN) to detect di?erent types of vehi-cles 
[5]. While detecting di?erent types of vehicles are important and interesting 
work, however, the need for tra?c-violation data mining remain the unsolved. 
In their work [5], Saran and Sreelekha (2015) mainly focused on road safety and 
surveillance system. 
Gupta, Mohammad, Syed and Halgamuge (2016) found a correlation between 
crime rates and accidents from Denver city of Colorado state. Note that tra?c-violations 
may lead to violent crimes as well [6]. For example, drunk driver 
may cause some property damage or injury to others. To describe the phe-nomenon, 
they said in [6] “The major cause of road accidents is drink driving, 
over speed[ing], carelessness, and the violation of tra?c rules” (p. 374). From 
their mining research, Gupta et al. (2016) were able to predict that in the months 
of January and February, most crimes are likely to occur. These ?ndings were 
helpful to the law-enforcement agencies (Gupta et al. 2016). They used data 
from the National Incident-Based Reporting System (NIBRS), The dataset con-tained 
15 attributes and 372,392 instances [6]. While, Gupta et al. (2016) in 
[6] presented interesting ?ndings based on their data mining research, however, 
their work is mainly focused on a speci?c city of a speci?c state. It is important 
that a research study focus on the entire US and try to generalize the ?ndings 
mentioned in [6]. 
Nath (2006) in [7] indicated that most criminals along with other crimes, 
committed tra?c-violation crimes as well. One of the interesting ?ndings from 
Nath (2008) was to claim that 10% criminals commits 50% of the crimes. Chen et 
al. (2004) mentioned that a tra?c-violation is a primary concern for city, county, 
and state level law-enforcement agencies. They also added that tra?c-violations 
and other criminal activities may be related, and information obtained from 
tra?c-violations can be further used to ?nd criminals. They focused on getting 
contact information from the Department of Motor Vehicles (DMV). 
This paper, will provide an overview of tra?c-violation data mining as well 
as some interesting ?ndings that can be helpful to maintain cautions and prevent 
unwanted tra?c-violations. The proposed data mining predicts where and what 
time of the day the incidents (tra?c-violations) will occur based on National 
database. Also, what combinations of factors contribute to tra?c-violations. 
3 Method 
Several data mining algorithms were used to analyze the data. For example, 
Na¨ive Bayes, J48 decision tree, Decision Table, and Support Vector Machine. 
Also, a few statistical analysis, such as, linear regression analysis, correlation 
analysis, and reliability analysis were considered to analyze the ?nal data. Mul-tiple 
tools were used to process and analyze the data. For example, SPSS
Prediction of Tra?c-Violation Using Data Mining Techniques 287 
(i.e., Statistics is a software package developed by IBM company) tests helped to 
determine which attributes should be considered for data mining. Also, WEKA1 
(i.e., Waikato Environment for Knowledge Analysis) tool was used to perform 
data mining algorithms [8] on the research dataset. 
3.1 Data 
The data was downloaded from the national database for public data2 . The 
original database consists of 36 attributes. However, there were lots of attributes 
that did not show any variations. For example, the accident attribute only had 
“No” as a value. Attributes like that does not contribute to data analysis, so, 
those attributes were deleted before the ?nal analysis. The database consisted 
over one million records. Of course, some of the rows had some missing values 
or wrong values (e.g., human errors). Missing values and wrong values seemed 
to be due to user errors. The database included demographic information, such 
as, gender of vehicle drivers, and place of incidents, driver state, driver city, etc. 
3.2 Preprocessing 
The initial task for the preprocessing was to identify which attribute to keep 
and which attributes to discard. Of course, the database included overwhelming 
amount of data. However, for the data mining, only the most important and 
relevant attributes were considered for ?nal analysis. The preprocessing process 
included deleting missing data, deleting irrelevant attributes, modifying records 
to meaningful format, etc. 
– SPSS tests helped to determine which attribute could to be deleted or not 
included for data mining as well as ?nal analysis (see Table 2). 
– Missing and repeating attributes were discarded as well. Also, wrong entries 
were discarded from ?nal selection of data analysis. 
– The dataset was divided into training set and testing set. The training set 
consisted 67% of the data, whereas testing test consisted of 33% of the total 
number of records. Holdout method was used to determine the training set 
and testing set. 
Initial Processing. After the determining the training set and testing set, and 
deciding to keep some candidate attribute. Again, SPSS tests were executed to 
determine which attribute should be deleted to further increase the accuracy of 
the result. Mainly the test helped to determine which item should be deleted is 
“items-deleted” to increase the reliability value. For example, SPSS tests indi-cated 
time of the incident should be deleted to increase the reliability of the 
results. 
1 
https://www.cs.waikato.ac.nz/~ml/weka/downloading.html. 
2 
https://catalog.data.gov/dataset.
288 Md. Amiruzzaman 
Table 2. Inter-Item correlation matrix 
Personal 
injury 
Property 
damage 
Alcohol Contributed to 
accident 
Personal injury 1.000 
-o0.016 0.013 0.346 
Property damage 
-o0.016 1.000 0.019 0.368 
Alcohol 0.013 0.019 1.000 0.014 
Contributed to accident 0.346 0.368 0.014 1.000 
Initial Results. Initial processing suggested that most tra?c-violations hap-pened 
in Maryland (DC), more speci?cally in Washington, DC area. Also, after 
modifying the date of incident to weekdays (e.g., Sunday, Monday, Tuesday, 
Wednesday, Thursday, Friday, and Saturday), it was noticed that most tra?c-violations 
happened on Tuesday and Wednesday (see Fig. 1.). This is maybe 
because people are more anxious on mid-week (i.e., we call it mid-week e?ect). 
Fig. 1. Number of incidents in days. (x-axis is days–Sunday (starting from left), and 
end with Saturday (on the right); y-axis is the number of incidents). 
4 Results 
4.1 SPSS 
Correlation analysis helped to determine that property damage and alcohol 
were correlated (17%). Similarly, contributed to accident and property damage
Prediction of Tra?c-Violation Using Data Mining Techniques 289 
were correlated (34%); contributed to accident and personal injury were corre-lated 
(37%). The correlation values were calculated using the following equation 
(see (1)): 
rxy = 
.xi=0 
n 
(xi 
-i 
x¯)(yi 
-i 
y¯) 
.¯
.¯i=0 
n 
(xi 
-i 
x¯)2 
.¯i=0 
n 
(yi 
-i 
y¯)2 
(1) 
where, 
rxy is the correlation value between variables, 
x and y, 
.,
is the symbol for “sum up”, 
xi is the individual value of variable x, 
x¯ is the mean of variable x. 
Similarly, yi is individual value of variable y, 
y¯ is the mean of variable y. 
In this analysis linear regression was used to verify some of the prediction 
made by the WEKA software. The regression equation can be expressed as (see 
(2)) 
yi = a + bxi + c (2) 
where, 
Y is the dependent variable that the equation tries to predict, 
X is the independent variable that is being used to predict Y , 
xi 
?i 
X, and i = 1, 2, 3, ..., n, 
yi 
?i 
Y , and i = 1, 2, 3, ..., n, 
a is the Y -intercept of the line, 
b is the slope, 
and c is a value called the regression residual, which can be calculated by 
|yˆi 
-ˆ 
yi|, where yˆi is the expected value of y. 
The values of a and b are selected so that the square of the regression residuals 
is minimized. 
More detail about regression equation and example of regression can be 
found online3 . The results obtained from linear regression analysis is presented 
in Table 3. 
Table 3. Linear regression analysis 
Model R R2 Adjusted R2 Std. error of the estimate 
1 0.404 0.163 0.163 0.125 
Reliability values were calculated using equation below (see (3)) 
a) = 
N 
× 
c¯ 
v¯ + (N 
-¯ 
1) 
× 
c¯ 
(3) 
3 
http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm.
290 Md. Amiruzzaman 
where, N is the number of items c¯ is average iter-item covariance and v¯ is average 
variance. 
The reliability of four attributes (i.e., personal injury, property damage, alco-hol, 
and contributed to the accident) was 0.435 (see Table 4.) 
Table 4. Reliability statistics 
Cronbach’s a. Cronbach’s a. based on standarized items N of items 
0.435 0.362 4 
4.2 Na¨.ave Bayes 
The Na¨ive Bayes classi?er is one of the most popular classi?ers in data mining. 
To describe the strength of Na¨ive Bayes [9] wrote “The na¨ive Bayes classi?er 
computes the likelihood that a program is malicious given the features that are 
contained in the program. This method used both strings and bytesequence data 
to compute a probability of a binary’s maliciousness given its features” (p. 6). 
Results obtained from Na¨ive Bayes is presented in Table 5. 
Table 5. Comparisons of di?erent methods 
Method 
name 
Correctly 
classi?ed (%) 
Incorrectly 
classi?ed (%) 
Kappa 
statistics 
Root Mean 
Square Error 
(RMSE) 
Precision Recall 
J48 
decision 
tree 
97.67 2.32 0.24 0.14 0.98 0.99 
Na¨ive Bayes 97.60 2.39 0.06 0.13 0.97 0.99 
Support 
Vector 
Machine 
(SVM) 
97.61 2.38 0.00 0.15 0.97 1.00 
Decision 
table 
97.64 2.35 0.24 0.13 0.98 0.99 
Following the mathematical de?nition will help to explain how the Na¨ive 
Bayes classi?er works. 
Let, the dataset be d, and set of classes C = c1,c2, ..., cn, and predicted class 
c 
?n 
C. The Na¨ive Bayes classi?cation can be expressed as (see (4)), 
P (c|d) = 
P (d|c)P (c) 
P (d) 
(4)
Prediction of Tra?c-Violation Using Data Mining Techniques 291 
Over 500,000 instances were analyzed using Na¨ive Bayes (Weka could not 
return any results over 0.5 million records). 67% of them as training set and 
33% of them as testing set. 
The confusion matrix helped to compute the accuracy of classifying algo-rithms. 
Therefore, the accuracy of a classifying algorithm can be de?ned as 
(see (5)), 
Accuracy = 
(TP + TN) 
(TP + FP + TN + FN) 
(5) 
here, TP = True Positive, TN = True Negative, FP = False Positive, and FN 
= False Negative. 
With 97.6% accuracy Na¨ive Bayes algorithm was able to classify tra?c 
violations-personal injury, property damage, and the presence of alcohol. The 
confusion matrix of Na¨ive Bayes has shown that only 297 records were classi?ed 
as “True Negative” (see Table 6) 
Table 6. Confusion matrix (Na¨ive Bayes) 
Predicted class 
No Yes 
Actual class No True positive = 327107 False negative = 331 
Yes False positive = 7715 True negative = 297 
In the database di?erent types of vehicle was reported. For example, motor-cycle, 
automobile, station wagon, limousine, etc. Na¨ive Bayes algorithm was able 
to classify tra?c-violations based on vehicle type with accuracy of 87.444%. Also, 
Na¨ive Bayes algorithm reported that automobile had the highest incident records. 
4.3 J48 
The J48 decision tree algorithm was used to visualize and determine how predic-tion 
was made. In fact, J48 algorithm uses a mathematical model to determine 
information gain can help to determine which variable ?ts better in terms of 
target variable prediction. There are other data mining research, such as [10] 
used J48 decision tree to predict their outcome variables as well. 
Following the mathematical de?nition will help to explain how SVN classi?er 
works. 
Let, the dataset be d, The dependent variable is Y (i.e., the target variable 
that the algorithm is trying to classify). 
The dataset d is consists of vector x, which is composed of the features, 
x1,x2,x3,... etc. that are used to make the classi?cation or the decision tree. 
Then, the decision tree algorithm can be expressed as (see (6)) 
(x, Y ) = (x1,x2,x3,...,xk,Y ) (6) 
where, k is number of features in vector x.
292 Md. Amiruzzaman 
Around 5:00 pm, the tra?c-violation happened did not involve alcohol, which 
make sense as most people leave their work at that time. However, perhaps the 
rush to go home may cause those tra?c-violations at that time. On the other 
hand, most tra?c-violations between 12:00 am and 1:00 am involved alcohol, 
which indicates that those occurred by drunk drivers. Perhaps, law-enforcement 
agencies should look into those incidents and maintain more cautions. The J48 
algorithm classi?ed with the accuracy of 97.6% correct classi?cation. The con-fusion 
matrix of J48 has shown that only 1290 records were classi?ed as “True 
Negative” (see Table 7) 
Table 7. Confusion matrix (J48) 
Predicted class 
No Yes 
Actual class No True positive = 326350 False negative = 1088 
Yes False positive = 6722 True negative = 1290 
In addition, the J48 algorithm was able to classify tra?c-violations based 
on vehicle type with accuracy of 87.433%. Also, J48 algorithm reported that 
automobile had the highest incident records. 
4.4 Support Vector Machine (SVM) 
Support vector machine (SVM) is one of the powerful data classi?cation tools. 
The SVM was invented at ATT Bell Laboratories by Cortes and Vapnik in 
1997 [11]. To describe the strength of SVM classi?cation algorithm Kim, Pang, 
Je, Kim, Bang and Yang (2003) in [11] wrote, “The SVM learns a separating 
hyperplane to maximize the margin and to produce a good generalization ability” 
(p. 2757). 
Witten and Frank (2009) in [12] mentioned, “Support vector machines select 
a small number of critical boundary instances called support vectors from each 
class and build a linear discriminant function that separates them as widely as 
possible” (p. 188) 
Following the mathematical de?nition will help to explain how SVN classi?er 
works: 
Let, the dataset be d, and set of classes C = c1,c2, ..., cn, and predicted class 
c 
?n 
C. Also, the input set X = x1,x2, ..., xn and x 
?n 
X. Here, X is input and C 
is output. Now, if we want to classify c = f(x, a), where, a) are the parameters 
of the function, then SVN can be expressed as (see (7)) 
f(x, {w, b}) = sign(w 
× 
x + b) (7) 
where, w is weight and b is bias.
Prediction of Tra?c-Violation Using Data Mining Techniques 293 
SVN algorithm was able to classify tra?c-violations based on vehicle type 
with accuracy of 87.433%. Also, reported that automobile had the highest inci-dent 
records. The confusion matrix shows the accuracy of SVM classi?er (see 
Table 8). 
Table 8. Confusion matrix (SVM) 
Predicted class 
No Yes 
Actual class No True positive = 327438 False negative = 0 
Yes False positive = 8012 True negative = 0 
4.5 Decision Table 
The Decision Table (DT) is a rule based classi?cation model is “Decision table”. 
This type of method generates rules of associations from the data and groups 
the data or classi?es the data. The decision table uses best-?rst search and 
cross-validation for evaluation [12]. 
Here, the symbol “ 
def 
= ” represents de?ning relationship. Let, f(x) 
def 
= x + 1 
de?nies the ralationship of x with function f. In terms of predicting relationship 
using DT can be de?ned as (see (8)): 
R(x, y) 
def 
= y = x (8) 
where, R is relationship function between x and y. Which indicates that some y 
helps to predict x. 
DT algorithm was able to classify tra?c-violations based on vehicle type with 
accuracy of 87.451%. The DT analysis reported that automobile had the highest 
incident records. The confusion matrix shows the accuracy of SVM classi?er (see 
Table 9). 
Table 9. Confusion matrix (Decision table) 
Predicted class 
No Yes 
Actual class No True positive = 326203 False negative = 1235 
Yes False positive = 6664 True negative = 1348
294 Md. Amiruzzaman 
5 Discussion 
5.1 Learning from the Data Processing 
The original data was download as comma-separated values (CSV) ?le. However, 
I was important that csv ?le should be converted to WEKA supported ?le for-mat. 
A Java program was written to csv ?le to Attribute-Relation File Format 
(ar?) ?le format. During the conversion process, it was discovered that ar? ?le 
is sensitive to date format. What format is used in the ?le should be explicitly 
mentioned in the original ar? ?le, otherwise WEKA software cannot recognize 
the data type. 
During the data processing and analyzing from visualization tool provided 
by WEKA, it was discovered that WEKA support csv ?le as input as well. 
In order to make sense of time of incident, time attribute was discretized to 
nearest hour value. So, all time was discretized to 24-hour format, excel function 
was used to accomplish this task (e.g., MROUND(B2, “1:00”)). Also, during 
the presentation and feedback from experts, it was suggested to include date of 
the incident. However, date was not much informative. So, date was converted 
to day; built-in excel function was used to convert date to day number (e.g., 
WEEKDAY(A2), and then format was changed to dddd to get the day). 
During the analysis ?h value was calculated; ?a value measures relative improve-ment 
over random predictor. The ?h statistics was computed using following equa-tion 
(see (9)): 
?e = 
Dobserved 
-e 
Drandom 
Dperfect 
-e 
Drandom 
(9) 
In terms of success, precusion and recall values were calculated as well. For 
precision (10) was used. 
precision = 
TP 
TP + FP 
(10) 
where, number of true positive is TP, and number of false positive is FP. 
Comparisons of di?erent algorithm in terms of precision is shown in Table 10. 
Table 10. Precision comparison 
Na¨ive Bayes J48 SVM Decision table 
0.977 0.980 0.976 0.980 
For recall value (11) was used. 
recall = 
TP 
TP + FN 
(11) 
where, number of true positive is TP, and number of false negative is FN.
Prediction of Tra?c-Violation Using Data Mining Techniques 295 
Comparisons of di?erent algorithm in terms of recall is shown in Table 11. 
Table 11. Recall comparison 
Na¨ive Bayes J48 SVM Decision table 
0.999 0.997 1.000 0.996 
After obtaining precision and recall values, F 
-n 
statistics was computed 
(see (12)). 
F 
-e 
statistics = 
2 
× 
recall 
× 
precision 
recall + precision 
(12) 
Comparisons of di?erent algorithm in terms of F 
-f 
statistics is shown in 
Table 12. All algorithms provided same F 
-l 
statistics value. 
Table 12. F-measure comparison 
Na¨ive Bayes J48 SVM Decision table 
0.988 0.988 0.988 0.988 
To evaluate the prediction accuracy, root mean-squared error (RMSErrors) 
was computed (see (13)). 
RMSErrors = 
.e
.e
.e
.e
.e
i=1 
n 
(ˆ yi 
-i 
yi)2 
(13) 
where, yi is the observed value for the ith observation and yˆi is the predicted 
value. 
Comparisons of di?erent algorithm in terms of root mean square error is 
shown in Table 13. 
Table 13. Root mean-squared error (RMSErrors) comparison 
Na¨ive Bayes J48 SVM Decision table 
0.132 0.143 0.152 0.131
296 Md. Amiruzzaman 
Fig. 2. Number of tra?c-violations in 24 h. (x-axis is hours–0 or 24 (starting from left), 
then 1, 2, and end 23 (right); y-axis is number of incidents). 
6 Conclusion 
Obtained results from data mining and statistical analysis suggested that per-sonal 
injury was a must, if driver is drunk. Also, around 1:00 am was the most 
dangerous time to go out (see Fig. 2.); most property damage and personal injury 
happened because of drunk drivers between 11:00 pm to 1:00 am. This was the 
time when most incidents occurred as well. Among all the cities, DC area seemed 
to be more consistent with these results. Therefore, if you are in the DC area 
during this speci?ed times, then try not to hang out in the DC area at that time. 
Perhaps, analyzing more data and latest database from law-enforcement 
agencies could help us to ?nd more interesting information. Also, use di?er-ent 
data mining algorithms could help to understand the data better as well. 
Having a domain expert could be bene?cial to interpret the ?ndings and add 
more implications. 
As for the future study, visualization technique can be used to visualize the 
intensity of tra?c violations over geographic locations, and accident prone areas. 
Moreover, deep learning can be applied to identify or classify areas based on their 
violation probability as well. 
Acknowledgment. The author would like to thank to open data website (https:// 
catalog.data.gov/dataset) for making the dataset available for research and analysis. 
A special thank you to those who participated in the initial presentation and provided 
valuable feedback (part of this paper was presented and was submitted as a class
Prediction of Tra?c-Violation Using Data Mining Techniques 297 
project). Also, thank to Dr. Kambiz Ghazinour for helping me to think further about 
the data and analysis process. 
References 
1. Estimates, A.P.: U.S. and world population clock (2017). Accessed 19 Nov 2017 
2. Statistics Brain: Driving Citation Statistics (2016). Accessed 20 Nov 2017 
3. Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M.: Crime data mining: 
a general framework and some examples. Computer 37(4), 50–56 (2004) 
4. Solomon, S., Nguyen, H., Liebowitz, J., Agresti, W.: Using data mining to improve 
tra?c safety programs. Ind. Manag. Data Syst. 106(5), 621–643 (2006) 
5. Saran, K.B., Sreelekha, G.: Tra?c video surveillance: vehicle detection and classi-?cation. 
In: 2015 International Conference on Control Communication and Com-puting 
India (ICCC) (2015) 
6. Gupta, A., Mohammad, A., Syed, A., Halgamuge, M.N.: A comparative study of 
classi?cation algorithms using data mining: crime and accidents in Denver City 
the USA. Education 7(7), 374–381 (2016) 
7. Nath, S.V.: Crime pattern detection using data mining. In: 2006 IEEE/WIC/ACM 
International Conference on Web Intelligence and Intelligent Agent Technology 
Workshops, WI-IAT 2006 Workshops, pp. 41–44 (2006) 
8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: 
The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 
(2009) 
9. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection 
of new malicious executables. In: 2001 IEEE Symposium on Security and Privacy, 
S&P 2001 Proceedings, pp. 38–49. IEEE (2001) 
10. Olson, D.L., Delen, D., Meng, Y.: Comparative analysis of data mining methods 
for bankruptcy prediction. Decis. Support. Syst. 52(2), 464–473 (2012) 
11. Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.Y.: Constructing support vector 
machine ensemble. Pattern Recognit. 36(12), 2757–2767 (2003) 
12. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech-niques, 
2nd edn. Elsevier Inc., Amsterdam (2005)
An Intelligent Tra?c Management System 
Based on the Wi-Fi and Bluetooth Sensing 
and Data Clustering 
Hamed H. Afshari1(?) , Shahrzad Jalali2 , Amir H. Ghods1 , and Bijan Raahemi2 
1 
SMATS Tra?c Solutions Inc., Ottawa, ON K1Y 3B5, Canada 
h.h.afshari@gmail.com 
2 
Knowledge Discovery and Data Mining Lab, Telfer School of Management, 
University of Ottawa, 55 Laurier Ave., E, Ottawa, ON K1N 6N5, Canada 
Abstract. This paper introduces an automated clustering solution that applies to 
Wi-Fi/Bluetooth sensing data for intelligent route planning and city tra?c 
management. The solution is based on sensing Wi-Fi and Bluetooth MAC 
addresses, preprocessing the collected real data and implementing clustering 
algorithms for noise removal. Clustering is used to recognize Wi-Fi and Bluetooth 
MAC addresses that belong to passengers traveling by a public transit bus. The 
main objective is to build an intelligent system that automatically ?lters out MAC 
addresses that belong to persons located outside the bus for di?erent routes in the 
city of Ottawa. This system alleviates the need for de?ning restrictive thresholds 
that might reduce the accuracy, as well as the range of applicability of the solution 
for di?erent routes. Various clustering models are built to ?lter out the noise based 
on four features of the average of the signal strength, its variance, number of 
detections, and travel time. We compare the performance of clustering using the 
Silhouette analysis and the Homogeneity-Completeness-V Measure score. We 
conclude that K-means and hierarchical clustering algorithms have a superior 
performance for clustering. 
Keywords: Wi-Fi Bluetooth sensing · Clustering · Intelligent transportation 
1 Introduction 
1.1 Problem Statement 
The cost of city congestions in North America has been estimated about $120B in 2012. 
This is in addition to its negative impacts on the environment, as well as on the economy 
that relies on the speed and e?ciency of mobility. Public urban transit systems provide 
a convenient and a?ordable solution for this problem. However, the limited revenue 
obtained from bus fares limits the number of operating lines for public transit buses. 
Hence, to overcome the problem of tra?c congestion, optimal operational decisions on 
the bus transit planning has a crucial role. Such decisions rely on estimating the number 
of passengers, identifying their origins and destinations, and optimizing the travel cost. 
Traditional methods of transit data gathering and transit decision planning were mainly 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 298–312, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_24
based on human, whereas they were expensive and time-consuming. Even though some 
transit companies use data obtained from smart card transactions, those data may only 
be used to ?nd the origin of passengers and not their destinations, and ride time. 
A new approach for solving the tra?c congestion problem is based on using Wi-Fi-
Bluetooth sensing technologies for estimating the number of passengers, as well as their 
origins and destinations. Nowadays, Bluetooth and Wi-Fi signals are constantly being 
emitted by smartphones, tablets, and vehicular embedded systems. These signals can be 
identi?ed by their device’s unique Media Access Control (MAC) address. Note that 
every MAC address is unique to its device and does not change over time. Sensors can 
detect such information, and moreover, to track the device and the individual who moves 
with that device over time. These individuals can be drivers, passengers of vehicles, 
pedestrians, or cyclists. The main concern about such technologies is about recognizing 
the MAC addresses that belong to passengers traveling by the bus from those that belong 
to individuals outside the bus. 
1.2 Literature Review 
There has been a large number of studies in recent years that focus on using Wi-Fi and/ 
or Bluetooth sensors to manage tra?c congestions. The Wi-Fi and/or Bluetooth MAC 
addresses may be tracked to ?nd the number of individuals in crowded places such as 
store lines, supermarkets, public buses, stations, etc. Some of these studies were applied 
to public transportation systems such as buses, trains, and undergrounds, while the other 
only focused on individual vehicles. Wi-Fi and/or Bluetooth sensors may furthermore 
be used to estimate the origin-destination (OD) of passengers, their wait time, and their 
travel time. Dunlap et al. [1] have used Wi-Fi and Bluetooth sensing technologies to 
estimate OD of passengers in transit buses. They mounted sensors on four buses to 
collect Wi-Fi, Bluetooth, and GPS data in four weeks. They applied some preprocessing 
steps on collected data in addition to numeric thresholds to remove noise. They moreover 
estimated OD data of passengers at di?erent bus stops and validated the results using 
ground truth bus routes [1]. Ji et al. [2] have employed Wi-Fi sensors and boarding data 
to present a hierarchical Bayesian model for estimating the OD ?ow matrix and the 
sampled OD ?ow data. They evaluated the accuracy of their method using a bus route 
empirically. Kostakos et al. [3] have developed a Bluetooth detection system that records 
behaviors of passengers. They showed that approximately 12% of passengers carried 
Bluetooth devices, and they measured the ?ow of passenger’s daily movements with 
80% accuracy [3]. Blogg et al. [4] have estimated the OD data using MAC addresses of 
Bluetooth devices embedded in vehicles and cell phones of motorists. They showed that 
the use of Bluetooth technologies for capturing OD data in limited networks is a cost-e?ective 
solution. Kostakos et al. [5] have introduced an automatic method to collect 
passengers’ end to end trip data. They collected the location of the bus, the ticket data, 
and the number of people on the bus using a Bluetooth detection sensor. They calculated 
the OD matrix, related graphs and analyzed them to optimize transit plans by redesigning 
routes and providing new services [5]. 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 299
1.3 Contributions 
This paper introduces an intelligent and automated system to recognize the Wi-Fi 
and/or Bluetooth MAC addresses that belong to persons in the bus. This system is based 
on de?ning some features and clustering them into distinct groups. Experiments are 
conducted to show the performance of this method for real-world applications. 
Section 2 brie?y reviews some clustering approaches used in this paper. Section 3 
presents the test setup and the experiment design. Section 4 discusses cluster modeling 
and analysis. 
2 Main Approaches for Clustering 
2.1 Center-Based Clustering 
Center-based clustering is referred to a class of clustering techniques in which the 
cluster’s centroids are calculated based on a user-speci?ed number of clusters. After 
that, data points are classi?ed into these clusters such that every cluster contains a set 
of data points that are more similar (closer in the distance) to its centroid [6]. Center-based 
clustering techniques mainly include K-means, fuzzy K-means, and K-medoids. 
The K-means algorithm divides data points into groups of equal variance by minimizing 
the within-cluster sum of squared error. The K-means algorithm attempts to cluster a 
set of N data points into K disjoint clusters, where the cluster centroid is calculated by 
the mean µj of data points. The cost function is the within-cluster sum of squared error 
(the Euclidean norm) and is given by [7]: 
n 
?] 
i=0 
mi 
??j? 
n 
C 
( 
?i 
?i 
?i 
xj 
-j 
??i 
?? 
?? 
?? 
2) 
(1) 
The K-means algorithm su?ers sensitive to noise and outliers. To overcome this 
issue, the K-medians algorithm uses the Manhattan Norm (instead of the Euclidean 
Norm l2) as the distance between data points [8]. The median is de?ned as the most 
centrally located object within a cluster that has the smallest average dissimilarity to 
other objects in the cluster. Compared to the K-means, the K-medoids is more robust to 
noise and outliers [8]. The K-means and K-medoids are all exclusive clustering techni- 
ques [6] in which every data point is assigned to a single cluster. There are many cases 
in which a data point may belong to more than one cluster with a speci?c probability. 
The fuzzy K-means clustering assigns every data point to every cluster with a member- 
ship weight that is between 0 and 1. Membership 0 means that the object does not belong 
to the cluster, whereas membership 1 means that it belongs. It is assumed that the sum 
of weights (probabilities) for each object is equal to 1. 
2.2 Graph-Based Clustering 
Graphs are used to represent data in some data mining applications, in which the nodes 
are data points, and the links are the connections among data points [6]. The 
300 H. H. Afshari et al.
agglomerative hierarchical clustering is as an example of graph-based clustering. It starts 
with every data point as a single cluster. After that, new clusters are repeatedly generated 
by merging the two nearest clusters until a single cluster that includes all data points is 
produced [6]. The key idea of hierarchical clustering is the calculation of the proximity 
function between two clusters. There are some metrics to calculate the proximity func- 
tion for merging the nearest two clusters. They mainly include [7, 9]: (1) the Ward metric 
that minimizes the sum of squared di?erences of data points inside a cluster; (2) the 
maximum metric that minimizes the maximum distance between data points of every 
two clusters; (3) the group average metric that minimizes the average of distances 
between all data points of every two clusters. 
2.3 Density-Based Clustering 
The key idea of density-based clustering is that a cluster is a dense region of data points 
which is surrounded by a region of law density. This idea is used to create a clustering 
algorithm that has superior performance for situations in which clusters are irregular, or 
intertwined, as well as situations include noise and outliers [6]. In such situations, the 
center-based clustering or the graph-based clustering approach cannot present a satis- 
factory performance. Density-based clustering techniques ?nd regions of high-density 
that are separated from each other by low-density regions. The DBSCAN [6] is one of 
the most e?ective density-based clustering techniques that determine the number of 
clusters automatically and generates partitioned clusters. Moreover, it can isolate data 
points in the low-density regions as noise and remove them from the clustering subspace. 
A center-based density metric is used to quantify the density of data points. It may 
be calculated by counting the number of data points located within a speci?ed radius, 
named as Eps, of every point [6]. The center-based density metric classi?es each point 
within data points into three main categories including core points, border points, and 
noise points. The core point is a point located inside a density-based cluster. The border 
point is the point that is not a core point but is located within a close neighborhood of 
the core point. The noise point is also a point that is neither a core point nor a border 
point and is located relatively far from the centroids [6]. 
3 Test Setup and Experiment Design 
3.1 Sensing Device: Smats Tra?cBox™ 
The Smats Tra?cBox™ is a pole-mount, battery operated Bluetooth and Wi-Fi sensor 
that was designed and built at SMATS Tra?c Solutions Inc. Sensors operate inside a 
ruggedized shockproof and waterproof case. It is ideal for tasks that require putting the 
sensor at a speci?c location to collect data for several days. It can scan for up to 4 days 
per one charge. The ruggedized case is equipped with a pole-mount con?guration, such 
that it can scan for days without the need for monitoring. Tra?cBox™ sensors can collect 
data on moving vehicles as well as in stationary positions. Sensors have adjustable 
detection zones that cover a circular or a directional area for detecting Bluetooth and 
Wi-Fi devices. Figure 1 shows a typical Tra?cBox™ mounted on a pole. Tra?cBox™ 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 301
detects Bluetooth Classic and Low Energy devices. Note that Bluetooth devices are often 
detected in the discovery mode. If a device is in this mode, the chance of detection by 
a sensor is extremely high. However, few Bluetooth devices are in this mode. Traf- 
ficBox™ can additionally detect Bluetooth devices in the paired mode. In this mode, 
two devices are connected and communicating with each other. 
Fig. 1. A typical Smats Tra?cBox™ device that collects Bluetooth and Wi-Fi data. 
Tra?cBox™ not only stores data o?ine, but also can send data in real-time for online 
storage and real-time tra?c monitoring. For o?ine data collection, the data is saved 
onto a micro SD card. Data are later uploaded to a computer as a raw data set, or are 
uploaded to the Smats cloud server and can be analyzed in their analytics platform. 
Tra?cBox™ sensors collect following data: MAC addresses, detection time stamps, 
type of devices (Bluetooth or Wi-Fi, with Bluetooth Low Energy optional), the signal 
strength, and GPS location data. 
3.2 Experiment Design 
Ground truth experiments were conducted using public urban transit buses traveling in 
the city of Ottawa. Tra?cBox™ is placed inside the bus to collect MAC address data 
under two di?erent test scenarios; each corresponds to a speci?c route. Note that 
collected raw data contain noise and outliers that mainly correspond to MAC addresses 
outside the bus. Before feeding raw data into clustering algorithms, they need to pass 
through some preprocessing steps (see Sect. 3.3). After clustering MAC addresses and 
identifying the ones that belong to passengers on the bus, they can be used for further 
applications. These applications include calculation of the OD matrix, estimation of the 
wait and the travel time for every passenger, optimizing bus transit plans, etc. Two routes 
are considered for test, where each realizes a test scenario. The ?rst test uses the route 
101 that starts from the St. Laurent 3C station and ends at the Bayshore 1A station. The 
GPS data are used to locate bus stops over time. Figure 2 shows a Google map view of 
the routes 101 used in the test scenario #1. 
302 H. H. Afshari et al.
Fig. 2. Google map view of the route 101 in the city of Ottawa. 
The second test uses the route 85 that starts from the Bayshore 4B station and ends 
at the Lebreton 2A station. Figure 3 shows a Google map view for routes 85. A large 
part of the route 85 passes through the downtown of Ottawa, where it is usually more 
crowded than the route 101. The route 85 is used to check the performance of clustering 
algorithms on scenarios include a large number of passengers, crowded streets, and 
crowded bus stations. Note that during experiments, the number of passengers in the 
bus, as well as the number of entries and exits at every stop is manually counted. These 
numbers are later used to intuitively check the performance of clustering algorithms. 
Data collected by Tra?cBox™ are uploaded to a computer using a USB port. 
Fig. 3. Google map view of the route 85 inside the city of Ottawa. 
3.3 Data Cleaning and Preprocessing 
Collected Bluetooth and Wi-Fi data include MAC addresses that belong to the all 
detected device in a certain range of distance. This range may be changed by replacing 
the passive scanner antenna of Tra?cBox™. However, under real practical conditions, 
this range depends on some factors such as the weather condition, indoor obstacles, 
obstruction of urban infrastructure, etc. For two test scenarios in which Tra?cBox™ is 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 303
placed inside the bus, the range for Wi-Fi/Bluetooth detections is estimated to be about 
200 m. Tra?cBox™ generates a CSV ?le that includes the MAC address, the device 
type, the signal strength, location coordinates, and the time stamp for every detection. 
Note that sensors only detect Wi-Fi MAC addresses that belong to devices that are 
actively communicating with the net. Otherwise, sensors detect all paired Bluetooth 
devices without the need for them to be communicating with another source. 
The raw data collected by sensors contain a considerable amount of noise, outliers, 
and other inconsistency. For instance, at every bus stops, sensors detect MAC addresses 
that belong to boarding passengers as well as the ones that belong to pedestrians, non-passengers, 
or other individuals. Sensors may furthermore detect MAC addresses that 
belong to other moving vehicles nearby the bus, or other individuals whose distance 
from the bus is less than 200 m. Moreover, stationary Wi-Fi routers may have a long 
detection range, and they should be considered as a source of noise [1]. In practical 
situations, some passengers may turn their Bluetooth and/or Wi-Fi devices on or o? 
during the trip [1]. Hence, sometimes it is di?cult to recognize the noise and other outlier 
MAC addresses, even by eyes. In this context, to alleviate negative impacts of noise and 
outliers, some preprocessing steps are recommended. In these steps, some soft thresholds 
(instead of strict thresholds that completely remove outliers) is de?ned and applied to 
raw data to remove outstanding outliers. Remaining outliers are automatically removed 
though clustering. In this research, data preprocessing is performed in Python 3 and 
Pandas library. 
Dunlap et al. [1] have explained some preprocessing steps include applying strict 
thresholds. This research uses some of their preprocessing steps, whereas our thresholds 
are smaller. In the ?rst step, based on the type of device, Wi-Fi MAC addresses are 
separated from the Bluetooth ones. Clustering algorithms are separately applied to the 
Wi-Fi and Bluetooth MAC addresses. In the next step, a threshold is de?ned based on 
the number of detections Ndetect for every unique MAC address. MAC address data whose 
number of detections is smaller than Ndetect is removed. In this research, Ndetect is set to 
Ndetect = 2, such that 
Detections per travel >Ndetect. (2) 
Another important factor for preprocessing is the travel time that is de?ned as the 
di?erence in time between the ?rst and the last detection. The next step is to remove 
MAC addresses whose travel time is smaller than a threshold, Ttravel, such that 
Detections with travel time > Ttravel. (3) 
In this research, a threshold on the travel time for both Bluetooth and Wi-Fi devices is 
set to Ttravel = 30 s. This means that MAC addresses with a travel time smaller than 30 s 
are removed. 
In the ?nal step, unique MAC addresses (Bluetooth and Wi-Fi separately) are iden- 
ti?ed, and the average of their signal strength over all detections are calculated. After 
that, MAC addresses with the average signal strength greater than a threshold Sstrength 
are kept such that 
304 H. H. Afshari et al.
Average signal strenght > Sstrenght. (4) 
In this research, the threshold on the average of signal strength for Wi-Fi and Bluetooth 
detection data is set to Sstrength = -o 80 dB. This means that MAC addresses with the 
average signal strength smaller than -h 80 dB are ?ltered out. 
3.4 Feature Extraction and Feature Engineering 
Clustering is referred to as the task of dividing data points into some groups such that 
data points in the same groups have more similar properties compared to other data 
points. In this context, clustering algorithms can be used to detect anomalies (discords). 
Anomalies are referred to as unusual or unexpected patterns occur in a dataset surpris- 
ingly [10]. To use clustering algorithms for the anomaly detection of time-series data, 
there are three main approaches including [10]: (1) model-based approaches, (2) feature-based 
approaches, and (3) shape-based approaches. In the model-based approach, a 
parametric model is created for each time-series dataset, and alternatively, the raw time-series 
dataset is converted into model parameters. Later on, a proper model distance and 
a clustering algorithm are selected to cluster the dataset into some groups. In the feature-based 
approach, every time-series dataset is converted into a feature vector. The clus- 
tering algorithm is then applied to feature vectors to divide them into distinct groups. 
The third approach is the shape-based clustering in which shapes of time-series datasets 
are compared based on a similarity index. Some nonlinear stretching and contracting 
transformations are initially applied to datasets to match them as much as possible [10]. 
In this research, the feature-based approach is used in which every the time series 
MAC address sensing dataset (passed through preprocessing steps) is converted into a 
feature vector. After that, generated feature vectors are fed into clustering algorithms to 
cluster MAC addresses that belong to passengers inside the bus into one group. Note 
that clustering algorithms divide datasets into some groups based on statistical properties 
of features. In this research, the feature vector is de?ned based on statistical properties 
of MAC addresses that belong to passengers inside the bus. It is given by: 
??:  = 
[ 
avg(s) var(s) n ?T 
]T 
. (5) 
Where avg(s) and var(s) are respectively the average and the variance of signal strength 
values and are calculated over all detections for every unique MAC address. Moreover, 
n and ?T are the number of detections and the travel time for each MAC address, 
respectively. The number of feature vectors is equal to the number of unique MAC 
addresses. Note that before applying clustering, feature vectors are normalized such that 
they have a zero mean and a unit Euclidean norm. 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 305
4 Cluster Modeling and Analysis 
Most of the classic clustering algorithms need to have the number of clusters as input, 
e.g., K-means clustering, K-medoids clustering, hierarchical clustering, etc. Besides, 
some advanced clustering algorithms automatically select the number of clusters, e.g., 
A?nity Propagation, Mean shift, DBSCAN, etc. Hence, to apply classic clustering 
algorithms, the optimal number of clusters is required. In this context, there are some 
statistical measures in the literature [11] (e.g., Davies Bouldin index, Silhouette analysis, 
etc.) that may be used to determine the best number of clusters. 
4.1 Number of Clusters 
The Silhouette analysis is used in this research to determine the optimal number of 
clusters for classic clustering algorithms. Silhouette analysis [12] is a powerful tool for 
interpretation and validation of the consistency within clusters of data points. It is mainly 
based on the evaluation of the separation distance between clusters that are generated 
by a clustering algorithm [12]. The Silhouette analysis provides an index that shows 
how similar a data point is to its cluster (cohesion) compared to other clusters (separa- 
tion). This index is in the range of [-f 1, + 1], where a high value near + 1 indicates that 
the corresponding datum is well matched to its cluster, and is far from neighboring 
clusters. An index 0 indicates that the corresponding data point is very close to the 
decision boundary between two neighboring clusters, and a negative index indicates that 
the datum is assigned to a wrong cluster [12]. 
The Silhouette index can furthermore be used to visually determine the proper 
number of clusters. The Silhouette index is calculated based on the mean intra-cluster 
distance a, and the mean nearest-cluster distance b for each data point [12]. Therefore, 
the Silhouette coe?cient s(i) for data point i is given by [12]: 
s(i) = 
b(i) -) a(i) 
max{a(i), 
b(i)} 
. (6) 
Note that b(i) is the distance between data point i, and the nearest cluster that contains 
the data point. It is deduced from Eq. (7) that: 1 =) s(i) =) 1. Figure 4 presents values of 
the Silhouette index versus the number of clusters for Wi-Fi data under two test 
scenarios. According to Fig. 4, it is deduced that the optimal number of clusters for both 
test scenarios is equal to 3 since the corresponding Silhouette index for each scenario 
has the largest value. Moreover, Fig. 5 presents a graphical representation of the Silhou- 
ette index obtained by the K-means algorithm. Figure 5 con?rms that clustering data 
into 3 clusters results in well-separated groups of data points, where all clusters pass the 
average Silhouette index (i.e., the dashed line). Due to lack of space, this paper only 
presents results corresponding to Wi-Fi MAC address data. 
306 H. H. Afshari et al.
Fig. 4. Values of the Silhouette index versus the number of clusters for Wi-Fi data. 
Fig. 5. Graphical representation of Silhouette analysis for 3 clusters (Wi-Fi MAC addresses). 
Tables 1 and 2 present numeric values of the Silhouette index versus the number of 
clusters obtained by the K-means algorithm for each test scenario. As presented, inde- 
pendent of the cluster number, the Silhouette index has a positive value close to 1, and 
this con?rms the proper performance of K-means algorithm for clustering Wi-Fi data. 
For both scenarios, the optimal value of the cluster number is equal to 3. 
Table 1. Silhouette index versus cluster numbers under test scenario #1 
Number of clusters: 2 3 4 5 6 
Silhouette coe?cient: 0.68 0.72 0.71 0.57 0.53 
Table 2. Silhouette index versus cluster numbers under test scenario #2 
Number of clusters: 2 3 4 5 6 
Silhouette coe?cient: 0.51 0.57 0.54 0.54 0.55 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 307
4.2 Building Cluster Models 
In this research, some algorithms are selected from the discussed three clustering 
approaches. They are applied to feature vectors and their performances for recognizing 
Wi-Fi MAC addresses are compared under two test scenarios. Note that feature vectors 
are generated from the preprocessed data, and hence, the outstanding noise and outliers 
have already been removed. The K-means, the fuzzy K-means, and the K-medians clus- 
tering algorithms are selected from the center-based approach. The agglomerative hier- 
archical clustering and the spectral clustering algorithm are selected from the graph-based 
approach. The DBSCAN and the Gaussian mixtures algorithm also come from 
the density-based approach. 
All the above algorithms, except the DBSCAN, need the number of clusters as an 
input. As discussed in Sect. 4.1, the optimal number of clusters is equal to 3. In this 
context, cluster 1 contains Wi-Fi MAC addresses that certainly belong to persons trav- 
eling by the bus. Cluster 2 represents the ones that certainly belong to a person outside 
the bus. Moreover, cluster 3 contains MAC addresses that more likely belong to people 
outside but nearby the bus. The decision on labels of the cluster is made by looking at 
clusters’ centroids. Simulation results need to manually be checked to ensure the proper 
performance of algorithms. Note that route 101 mostly passes through areas that are far 
from the downtown, whereas route 85 mostly passes through the downtown. Hence, test 
scenario #2 deals with clustering of a larger dataset collected from crowded bus and bus 
Fig. 6. Pro?les of signal strengths over time for Wi-Fi data collected from route 101. 
Fig. 7. Pro?les of signal strengths over time for Wi-Fi data collected from route 85. 
308 H. H. Afshari et al.
stops. Figure 6 presents pro?les of signal strengths for Wi-Fi MAC addresses of test 
scenario #1, before and after clustering. Figure 7 presents the ones under test scenario 
#2. Clustered data are obtained using the K-mean algorithm. 
Following Figs. 6 and 7, it is deduced that the K-means algorithm successfully sepa- 
rates Wi-Fi MAC addresses that belong to passengers in the bus under two di?erent test 
scenarios. To intuitively check the performance of clustering algorithms, it is a good 
idea to look at the clustered features. Figure 8 presents 2D plots of features related to 
testing scenario #1 and are clustered using the K-means algorithm. There are three main 
clusters, whereas their centroids are represented by numbers 1, 2, and 3, respectively. 
The clusters’ centroids are surrounded by data points that have features with closer 
values. Figure 8 shows that the K-means algorithm is successful to cluster data points 
into three groups based on their feature values. 
Fig. 8. 2D plots of clustered features generated by K-means clustering for Wi-Fi data of route 
101. 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 309
4.3 Performance Evaluation of Clustering Algorithms 
There are two main approaches for evaluating the performance of clustering algorithms. 
The ?rst approach concentrates on de?ning a statistical measure that numerically quan- 
ti?es how well similar data points are clustered into a group, without knowing labels. 
Besides, the second approach needs the knowledge of ground truth classes (similar to 
supervised learning) and is based on manual assignment of labels to data points during 
experiments. In this paper, the Silhouette analysis was employed to determine the 
optimal number of clusters. Note that values of the Silhouette index may further be used 
as a statistical measure for evaluating the clustering performance. The Silhouette index 
for the ?rst and the second test scenario, assuming 3 clusters, is respectively obtained 
equal to 0.72, and 0.57. Values of the Silhouette index versus the number of clusters 
were presented in Fig. 4. As shown, values of the Silhouette index are positive and are 
relatively close to 1, and hence, the proper performance of the K-means algorithm for 
clustering similar data is statistically con?rmed. 
To follow the second approach and evaluate the clustering performance manually, 
the Wi-Fi MAC address data obtained from the two test scenarios are labeled. After that, 
the accuracy of clustering algorithms is evaluated based on some metrics that include 
the Adjusted Rand Index [7], the Adjusted Mutual Information index [7, 13], the Homo- 
geneity-Completeness-V Measure score [7], etc. Homogeneity is a measure that checks 
to see if each cluster K contains only members of a single class C [7]. Besides, Complete- 
ness checks to see if all members of a given class C are assigned to the same cluster K 
[7]. Both Homogeneity and Completeness scores are in the range of [0, 1], where a larger 
value represents better performance. 
Homogeneity and completeness scores are respectively calculated by [7] . 
h = 
1 
-] 
H(C|K) 
H(C) 
, (7) 
c = 
1 
-) 
H(K|C) 
H(K) 
, (8) 
where H(C|K) is the conditional entropy of classes given the cluster labels and is calcu- 
lated by [7]: 
H(C|K) = -) 
|C 
?| 
c=1 
|K 
?| 
k=1 
nc,k 
n 
log 
( nc,k 
n 
) 
, (9) 
moreover, H is the entropy of the classes and is calculated by [7]: 
H(C) = -) 
|C 
?| 
c=1 
nc 
n 
log 
( 
nc 
n 
) 
. (10) 
Note that n is the number of data points, nc and nk are respectively the numbers of data 
points that belong to class c and cluster k, and nc,k is the number of data points from class 
c that are assigned to cluster k [7]. Moreover, the harmonic mean of Homogeneity and 
310 H. H. Afshari et al.
Completeness is referred to as the V-measure and is used to evaluate the agreement of 
two independent assignments on the same dataset [7, 13]. The V-measure score is ranged 
from [0, 1] and is calculated by [7]: 
v = 
2 
h × c 
h + c 
. (11) 
Table 3 presents the Homogeneity-Completeness-V-measure score calculated for 
clustering algorithms under the test scenario #1. According to Table 3, the K-means, the 
hierarchical, and the spectral clustering have the best performance. Note that in this 
research some of the clustering algorithms (e.g., the DBSCAN, and the A?nity Prop- 
agation algorithm) did not show an acceptable performance and hence, they are not 
considered for comparison. 
Table 3. Values of homogeneity-completeness-V-measure scores for test scenario #1 
Clustering algorithm Homogeneity score Completeness score V-measure 
K-means 0.896 0.953 0.924 
K-medians 0.821 0.818 0.820 
Fuzzy K-means 0.893 0.886 0.890 
Hierarchical 
clustering 
0.896 0.953 0.924 
Gaussian Mixture 0.857 0.852 0.855 
Spectral clustering 0.896 0.953 0.924 
5 Conclusion 
This paper presented applications of clustering algorithms for removing noise and 
outliers from Wi-Fi and Bluetooth MAC address detections. To estimate the tra?c load 
and provide an intelligent automated transit plan for public transit buses, it is important 
to separate MAC addresses that belong to passengers in the bus from the ones belong 
to persons outside the bus. Wi-Fi and Bluetooth detection data were initialed passed 
through some preprocessing steps that included applying some thresholds to remove 
outstanding noise and outliers. After that, clustering algorithms were used to automat- 
ically ?lter out the noise based on four features including (a) the average of the signal 
strength over all detections; (b) their variance; (c) the number of detections; and (d) the 
travel time. Performances of clustering algorithms were moreover compared in terms 
of the Homogeneity-Completeness-V-measure score. It is concluded that the K-means, 
the hierarchical clustering, and the spectral clustering algorithms had the best clustering 
performance. 
Future studies include using the clustering algorithms for the origin-destination (OD) 
estimation, predicting the tra?c load at each bus stop, and building an automated intel- 
ligent transit plan for public transit buses. 
An Intelligent Tra?c Management System Based on the Wi-Fi and Bluetooth 311
Acknowledgments. This research was supported by the Ontario Centres of Excellence (OCE) 
Grant 27911–2017, and NSERC Engage Grant EGP 514854–17, in collaboration with SMATS 
Tra?c Solutions. 
References 
1. Dunlap, M., Li, Z., Henrickson, K., Wang, Y.: Estimation of origin and destination 
information from Bluetooth and Wi-Fi sensing for transit. Transp. Res. Rec. J. Transp. Res. 
Board 2595, 11–17 (2016) 
2. Ji, Y., Zhao, J., Zhang, Z., Du, Y.: Estimating bus loads and OD ?ows using location-stamped 
farebox and Wi-Fi signal data. J. Adv. Transp. 2017 
3. Kostakos, V., Camacho, T., Mantero, C.: Towards proximity-based passenger sensing on 
public transport buses. Pers Ubiquitous Comput. 17(8), 1807–1816 (2013) 
4. Blogg, M., Semler, C., Hingorani, M., Troutbec, R.: Travel time and origin-destination data 
collection using Bluetooth MAC address readers. In: Australasian Transport Research Forum, 
vol. 36 (2010) 
5. Kostakos, V., Camacho, T., Mantero, C.: Wireless detection of end-to-end passenger trips on 
public transport buses. In: 13th IEEE International Conference on Intelligent Transportation 
Systems (ITSC), Funchal, Madeira Island, Portugal, pp. 1795–1800 (2010) 
6. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson Education 
Inc, Boston (2006) 
7. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory 
Methods 3, 1–27 (1974) 
8. Park, H., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 
36(2), 3336–3341 (2009) 
9. Rafsanjani, M., Varzaneh, Z., Chukanlo, N.: A survey of hierarchical clustering algorithms. 
J. Math. Comput. Sci. 5(3), 229–240 (2012) 
10. Aghabozorgi, S., Shirkhorshidi, S., Wah, T.: Time-series clustering: a decade review. Inf. 
Syst. 53, 16–38 (2015) 
11. Legany, C.: Cluster validity measurement techniques. In: Proceedings of the 5th WSEAS 
International Conference on Arti?cial Intelligence, Knowledge Engineering and Data Bases, 
Madrid, Spain (2006) 
12. Muca, M., Kutrolli, G., Kutrolli, M.: A proposed algorithm for determining the optimal 
number of clusters. Eur. Sci. J. 11(36), 1857–7881 (2015) 
13. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster 
evaluation measure. In: Joint Conference on Empirical Methods in Natural Language 
Processing and Computational Natural Language Learning, Prague (2007) 
312 H. H. Afshari et al.
Economic and Performance Based Approach 
to the Distribution System Expansion Planning 
Problem Under Smart Grid Framework 
Hatem Zaki1(&) , R. A. Swief2(&) , T. S. Abdel-Salam2(&) , 
and M. A. M. Mostafa2(&) 
1 
BC Hydro, Vancouver, BC, Canada 
hatemzaki@mail.com 
2 
Ain Shams University, Cairo, Egypt 
rania.swief@gmail.com, tarekabdelsalam@gmail.com, 
mahmoud.a.mostafa@hotmail.com 
Abstract. This paper proposes a new vision of the Distribution System 
Expansion (DSE) problem considering new system performance measures. The 
mathematical model has been rebuilt with a new combined multi-objective 
formula, minimizing the system expansion Capital costs, Operations and 
Maintenance (OM) costs and achieving the best combined performance measure 
consisting of a combination of Reliability, Resiliency and Vulnerability. A new 
practical weighted combined system performance index is applied and tested to 
be used by utilities replacing the common simple reliability indices. The new 
model uses the application of multi-objective optimization utilizing mixed 
integer design variables, which include a combination of seven logical and 
technical constraints to provide the best description of the real existing system 
constraints. In addition to the new system performance proposed index, a new 
algorithm of checking the system radial topology is proposed. The objective is 
to ?nd out the optimum sizing, timing and location of substations, into the 
power distribution network. The proposed approach has also been tested on 
14-bus real distribution system to demonstrate its validity and effectiveness on 
real systems. The proposed approach has been also tested on IEEE 37-bus model 
distribution system with modi?ed parameters that are signi?cantly larger and 
more complex than the parameters frequently found in literature. 
Keywords: Distribution system expansionSmart grids 
ReliabilityResiliencyVulnerabilityGenetic algorithm 
1 Introduction 
The distribution system is a vital part of the Electric Power System, denoted to connect 
the transformer substations and the customers. DSE is a fundamental task for system 
planners, asset managers, and operators. DSE is driven usually by the need to add 
capacity in the system due to load growth and the inability of existing systems to serve 
future loads. Finding an optimized solution to the DSE problem helps in making right, 
sound and justi?able decisions, and forms a good defense for any investment decision, 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 313–332, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_25
especially the decision related to building or expanding a transformer substation with a 
signi?cant investment. This investment may impact electricity rates and hence affect 
the ?nancial performance of the utility. 
DSE Planning involves decision making with multiple conflicting criteria, such as 
capital investment and OM costs, energy losses, and reliability. A utopian solution that 
optimizes all these objectives at the same time does not exist. Instead, a set of optimal 
trade-off solutions exist, in which an improvement in any objective leads to deterio-rations 
in other objectives. For example, a reduction in the investment cost by using 
smaller cross-sectional area conductors or lower-class metal (such as using Aluminum 
instead of Copper) increases energy losses and limits the ability to transmit power for 
longer distances hence dictates lower utilization of assets [1]. 
In traditional researches, the DSE problem was modeled using a single objective 
formula, often called the cost function. This cost function followed the model presented 
by Gonen et al. in several publications with many improvements over the years [2]. 
This model was further improved and may, now, include non-?nancial measures such 
as energy losses or reliability after aggregating them to ?t in the ?nancial formula [1]. 
Even though the Gonen’s school of thought dealt with a wide spectrum of DSE 
problems, it lacked the smart grid dimension of including performance as a measure 
when making such decision. 
In today’s decision making, under Smart Grid (SG) approaches, system performance 
is a vital characteristic of the DSE problem and must be combined as part of the 
objectives when making such signi?cant decision [3]. Several performance measures 
have been proposed by researchers in the past, most of them were based on the Energy (or 
Demand) Not Served (ENS) principles [4]. Some Researchers and utilities included 
reliability in the form of customer choices as a newer direction in reliability measures. 
This formed means of addressing customer expectations and was called Customer Based 
Reliability (CBR) [5]. Many utilities have used one or a combination of common reli-ability 
indices as a decision-making objective of system expansions and alterations [6]. 
Lately after a few destructive storms in North America, the IEEE standards and guides 
introduced the concept of resiliency as a performance measure [7]. The resiliency of a 
system is part of the common Customers Experiencing Lengthy Interruption Durations 
reliability index (CELID), but is looking for outages of extremely long durations, 
essentially more than 12 h. Researchers and planners have used the resiliency indices for 
the purpose of allocating sectionalizing switches on the distribution systems feeders to 
avoid entire feeder outages in case of contingencies [8]. Resiliency measures can be 
applied to a type of outage characterized by being signi?cant but expected (such as storms 
and hurricanes). These outage causes are characterized by local impact on a limited 
footprint. In this paper, a new modi?ed resiliency index has been applied combining the 
commonly used resiliency index with the addition of number of years under study. 
Vulnerability of a system can also be one of the performance measures of integrated 
systems. It has been widely used to assess cyber-security on data management and 
control systems including power system Supervisory Control And Data Acquisition 
(SCADA) [9]. Originally Vulnerability has been widely applied to water systems and 
electric power generation systems. Nowadays Vulnerability has been applied to electric 
power transmission systems with a quantitative risk approach. Vulnerability has several 
de?nitions based on the infrastructures it addresses. In the electric power system 
314 H. Zaki et al.
Vulnerability can be de?ned as the impact and likelihood of the outage of critical 
equipment in the system [10]. It de?nes the ability of the system to stay in-service 
under an unusual disastrous attack such as signi?cant destructive earth quakes with 
unpredicted destructive area and terrorist attacks. 
Recently, Invulnerability has been applied to Distribution System Planning as a 
consideration utilizing the graph theory by ranking all nodes in terms of their criticality 
with respect to the source node [11]. This approach forms the basis towards under-standing 
Vulnerability and the criticality of distribution assets of a distribution system, 
however, a vulnerability measure was not presented in these recent researches. 
In this paper, a new weighted combination of reliability, resiliency and vulnera-bility 
indices are proposed to be applied on distribution systems. These indices cover 
all expected and unexpected outage causes that may affect the distribution system 
infrastructure. A new index is then formed and used in the objectives for DSE planning 
problem. 
To solve the new formulated DSE model, Multi-objective optimization (also called 
multi criteria optimization, multi performance or vector optimization) is used utilizing 
an evolutionary solution algorithm [12]. Multi-objective optimization can be de?ned as 
the problem of ?nding a vector of decision variables which satis?es constraints and 
optimizes a vector function whose elements represent the objective functions [13]. 
After the signi?cant improvements of computer software and the evolution of 
Arti?cial Intelligence and Nature Inspired Techniques in solving multi-objective 
complex optimization problems [14], researchers have included reliability as an 
additional part of the objective function. Most researchers who have included reliability 
as a separate objective have utilized the Energy (or Demand) Not Served (ENS) con-cept 
as their main argument of modelling reliability such as Cossi et al. [3, 15]. 
A powerful class of optimization heuristic methods is the family of Metaheuristic 
Techniques. The Genetic Algorithm (GA) became particularly suitable for the DSE 
problem, once a well-established formulation for dealing with multi-objective problems 
has been achieved [16]. 
In this paper, a commonly available Multi-Objective GA (MOGA) is used as a 
means of ?nding the optimum, or near optimum solution with applications to modi?ed 
IEEE test cases as well as real life test cases. 
This paper is divided into six sections. In addition to this introduction, Sect. 2 
describes the DSE Problem including the new proposed parameters, in addition to 
presenting a new approach in determining the Radial Structure of the distribution 
system during the solution algorithm. The Mathematical Formulation and the solution 
algorithm are discussed in Sects. 3 and 4, respectively. Test Cases are presented and 
discussed in Sect. 5 and a conclusion is provided in Sect. 6. 
2 Problem Description 
The DSE problem is usually represented as a mixed integer multi-variable problem 
[17]. The list of variables in this paper represents the substation locations and line 
segment status (opened/closed or in-service/out-of-service). This model is presented to 
?nd the optimum size, timing, and location of the distribution substation, as well as 
Economic and Performance Based Approach to DSE Planning Problem 315
determining the optimum status of each line section (opened or closed) recommended 
for operations [18]. The optimum line section status, hence, identi?es the system 
con?guration. The model used in this paper using this methodology was evaluated and 
many complexities have been added to become as close as possible to the real systems. 
The developed model is simple but includes all necessary objectives and constraints 
to plan and operate the system. These constraints can be divided into Logical and 
Technical constraints. Logical constraints are ensuring the solution provides a radial 
system, all nodes are connected to one substation, and one of the new substations is 
selected while the existing substations are still in-service. The technical constraints 
include the voltage limits, line segment conductor current thermal limit and the power 
balance of the system (Supply capacity equals total loads). In this paper, the objective 
function is formed of two parts. 
The ?rst part is the total life cycle asset cost which includes the installation capital 
costs and the Operations and Maintenance costs (OM). This part is represented using a 
Cost Index (COSTINDEX). COSTINDEX is the Capital and the present value of Life 
Cycle costs referred to the maximum asset cost of the system. The purpose of this 
referral is to normalize the value obtained and make it homogenous with the other 
components of the objective function. 
The second part represents a combined system Contingency Index (CONDEX). 
This index consists of three weighted components giving the planner the choice to 
prioritize one component over the other by adjusting the three weights as required by 
the utility’s strategic approach. CONDEX is formed of the following components: 
(a) The Uni?ed Reliability Index (URI) – this index has been previously used as a sole 
indicator for reliability by utilities [19]. It is formed of four (or more) common 
reliability indices: System Average Interruption Frequency Index (SAIFI), System 
Average Interruption Duration Index (SAIDI), percentage of Customers Experi-encing 
Multiple Interruptions of 4 or more (CEMI-4), and Customers Experiencing 
Lengthy Interruption Durations of 6 or more hours (CELID-6). In this paper, only 
these four indices are used due to the practical nature of the distribution system. 
Other indices will require special unusual measuring equipment to provide enough 
data to be used in calculating these indices. 
(b) The System Resiliency Index (SRI) – This is one of the reliability indices but with a 
more stringent condition. The SRI is measured using CELID-12 which represents 
the percentage of customers experiencing outage durations of 12 h or more per 
year (or per study period). This de?nition is also provided by the IEEE-std 1366- 
2012 and has been used by many utilities across the world [7]. SRI was slightly 
modi?ed to include a measure of past number of years in order to add an argument 
that expresses the period over which the resiliency happens during a certain study 
period. For example, if the study period is measured over 5 years and the 12-h 
outages occurred 3 of the 5 years, then SRI becomes the sum of CELID-12, and the 
number 3 (assuming the total number of customers are have not changed for the 
study period). This makes SRI range anywhere from zero to six for a study period 
of ?ve years. By using this methodology in calculating SRI both the number of 
customers and the outage periods are included in this performance measure. 
316 H. Zaki et al.
(c) The System Vulnerability Index (SVI) – This is the new index presented in this 
work. SVI represents the ability of the distribution system to stay in-service 
during and after a massive disaster such as a massive earthquake, a one of a kind 
storm with destructive wind speed (not annual storms), large permanent floods, 
etc. To use a predictive Vulnerability index, three weighted arguments are created 
and selected to form the SVI. This performance index is a function of the fol-lowing 
arguments: 
– Node Distance Index (NDI) which presents the distance between each node 
and its source (substation in most cases) 
– Node Failure Rate Index (NFRI) which presents the failure rate of each node 
route as linked to its source. 
– Node Failure Duration Index (NFDI) which presents the failure duration of 
each node route as linked to its source. 
SVI is then formed of the sum of the weighted values of NDI, NFRI, and NFDI. 
Each of these measures are weighted according to its criticality to the distribution 
system planner and combined in the SVI. 
CONDEX is hence formed of the weighted sum of URI, SRI and SVI. The pres-ence 
of these weights provides enough flexibility to adjust the system con?guration 
according to the highest priority index according to the strategy of the utility. 
The above-mentioned Objectives are subjected to a number of constraints. These 
constraints limit the optimum solution to a practical implementable solution, making 
the model as close as possible to the systems implemented in real life. These constraints 
are explained further in the Mathematical Formulation section. Figure 1 shows an 
overview of the proposed model of the DSE problem objective function. 
These objectives are subjected to four logical constraints and three typical technical 
constraints to be all considered in the solution of the DSE problem all combined 
together. These constraints are usually applied to represent the real-world distribution 
systems which usually operate under these constraints. 
Fig. 1. Overview of the proposed model of the DSE problem 
Economic and Performance Based Approach to DSE Planning Problem 317
One of these constraints has been also rebuilt, with a new model, to better represent 
real systems. This constraint is the radiality constraint, in which the ?nal solution must 
consider the radial nature of the distribution power system to be operated. 
2.1 Checking the Radial Structure of the System 
In the DSE problem, previous researches used a single check-point to identify if the 
system is radial. On one hand, some algorithms use number of nodes in comparison to 
number of line sections after generating the element-node incidence matrix [20]. On 
another hand, the Floyd-Warshall Algorithm was also used to ?nd the shortest path in 
single source distribution systems [21]. Another method for representing the radiality 
constraint is to employ the branch-node incidence matrix [22]. These methods were 
typically oriented to special cases with stringent conditions and cannot be generalized. 
By studying these past algorithms, it can be observed that these algorithms have 
worked in the past but were conditioned by one or more of the following: 
(a) Test systems must NOT have internal loops supplied from the same line 
(b) All systems used have one source or modi?ed to satisfy one source before 
applying the algorithm 
Fig. 2. Radial structure checking algorithm overview 
318 H. Zaki et al.
In this paper, all radial structure conditions are combined under one algorithm. The 
proposed algorithm uses an Iterative methodology to check for internal loops within the 
system. Before it terminates, the algorithm uses a connectivity check algorithm to 
ensure all nodes are connected to a source and to only one source. Figure 2 presents an 
overview of the proposed algorithm. 
The proposed radial checking algorithm starts by isolating all power sources of the 
system (such as DG, Energy Storage, etc.) turning it to the classical well-known 
distribution system supplied by substations. The algorithm then performs the following 
checks: 
(A) Checking for Internal loops 
Internal loops are nodes and branches on the same feeders emerging from one node on 
the feeder and terminating on another node on the same feeder. Internal loops in graphs 
are called Cycles (or Network Cycles). In graph theory there are many numerical 
methodologies capable of determining the presence of cycles in a graph [23]. One of 
these methodologies is the Iterative Loop Counting Algorithm (ILCA). This method is 
characterized by returning the total number of cycles in a graph, as well as its ease of 
programming. 
ILCA searches for loops by moving along a dynamic path. The use of this dynamic 
path essentially turns the network into a tree, and the path at any given time is a line 
from the top of the tree to any of the nodes on the branches. Loops occur whenever a 
node ID exists in two separate places on the path. 
(B) Checking for connectivity to a supply node 
The connectivity to a supply (or a substation) can be determined using the well-known 
Floyd-Warshall (Shortest Paths Algorithm), which is part of the graph theory appli-cations 
[24]. 
(C) Checking if any node is supplied by more than one source 
This is a simple algorithm which also uses the connectivity algorithm explained before 
to determine if any of the nodes in the network is supplied by more than one substation. 
As mentioned in the above explanation, the algorithm of determining the presence 
of loops extensively uses the Graph Theory. It is very similar to the spanning tree 
algorithm with different alignment to match the required results. 
3 Mathematical Formulation 
As mentioned in the Problem Description section, the problem is formed of two 
objectives. The objective function is formed of two parts to be aggregated and mini-mized 
under one representation. In order to build the model on an index basis, the two 
parts of the objective function are normalized by referring them to maximum values in 
the system. As such the mathematical minimization problem can simply be as follows: 
Economic and Performance Based Approach to DSE Planning Problem 319
Minimize 
COSTINDEX 
¼ 
FNormalized 
þ CONDEXNormalized 
ð1Þ 
Equation (1) describes the overview of the objective function. The components of 
the Objective function are as follows: 
3.1 Minimization of Assets Life Cycle Costs (F) 
Capital Investment costs and the net present value of the OM costs are combined under 
the following formula: 
F 
¼ 
1 
Cst;t;max 
X 
T 
t¼1 
X 
stn 
i¼1 
Cst;tXiþ 
OMst;t 
¼ ¼ 
þ 
mX 
þ stn 
j¼stn þ 1 
Cl;tXjþ 
OMl;t 
( " 
l l 
#) 
ð2Þ 
Where, 
F total is the Life cycle costs of the assets during the study period 
T is the number of years of the study period 
stn is the total number of substations including old and new substations 
Cst,t & Cl,t is the total investment cost of the substation st and each line section l at 
year t 
OMst,t & OMl,t is the net present value of the Operation and Maintenance costs for 
substation st and line section l at year t 
X is the binary design valrable reflecting the status of substations and line sections 
Cst,t,max is the higest asset cost in the system 
In order to accommodate the unit differences, all values were normalized by referral 
to the highest asset cost in the system. This way all objective function arguments can be 
added with no compromise of units or values. 
3.2 Minimization of the Contingency Index (CONDEX) 
CONDEX 
¼ 
A 
h 
URI 
þ B 
R 
SRI 
þ C 
R 
SVI 
ð3Þ 
Where, 
A is the weighting factor of the Uni?ed Reliability Index (URI), 
B is the weighting factor of the System Resiliency Index (SRI) 
C is the weighting factor of the System Vulnerability Index (SVI) 
URI is mathematically de?ned as: 
URI 
¼ 
a1 
1 
SAIFI 
þ a2 
2 
SAIDI 
þ a3 
3 
CEMI 
3 
4 þ a4 
4 
CELID 
4 
6 
ð4Þ 
320 H. Zaki et al.
Where, 
a1, a2, a3, & a4 are the weighting factors of each reliability index 
SAIFI is the reliability Index known as System Average Inerruption Frequency 
Index 
SAIDI is the reliability Index known as System Average Interruption Duration Index 
CEMI – 4 is Customers Experiencing Multiple Interruptions of 4 or more 
CELID – 6 is percentage of number of Customers Experiencing Lengthy 
Interruption Durations of 6 h or more 
SRI is mathematically de?ned as: 
SRI 
¼ 
CELID 
R 
12 þ Nyrs 
ð5Þ 
Where, 
CELID – 12 is the percentage of Customers Experiencing Lengthy 
Interruption Durations of 12 h or more for a given number of years 
Nyrs is the ratio of number of years there has been 12 h or more outages during the 
span of given number of years CELID – 12 has been applied 
It is common to use 5 years for most cases as the ultimate number of years for 
system resiliency measurement. 
SVI is mathematically de?ned as: 
SVI 
¼ 
c1 
1 
NDI 
þ c2 
2 
NFRI 
þ c3 
3 
NFDI 
ð6Þ 
Where, 
c1, c2, c3 are the weighting factors of each vulnarability index 
And NDI, NFRI and NFDI are as previously de?ned in the Problem Description 
section. 
The above-mentioned objectives are subjected to a number of constraints to make 
the simulation as close as possible to the system in the ?eld. These constraints consist 
of two sets of constraints: (a) Logical and, (b) Technical as follows: 
(a) Logical constraints: 
i Radiality of the system – This constraint to make sure the distribution system is 
optimized as a radial system and no loops exist. This is implemented by an 
algorithm, shown in Sect. 2.1, returning a flag called RadialFlag. If the flag 
returns 1, then the system is radial. If the flag returns 0, then the system still has 
loops. 
ii Connectivity of all Nodes to a source – This constraint is to make sure all nodes 
are empowered using at least one source (substation). This is also implemented 
by an algorithm returning a flag called concheck. If the flag returns 1, then the 
system is all healthy and fed by its available sources. If the flag returns 0, then 
the system still has a disconnect. The algorithm uses the path function of the 
graph theory as its basis to check connectivity between nodes and substations 
Economic and Performance Based Approach to DSE Planning Problem 321
iii Selection of only one new substation – This is performed using the fact that the 
addition of the status variable of all new substations proposed to expand the 
distribution system must be equal to unity. The formula for this constraint is as 
follows: 
nnewsubs 
X 
i¼1 
Xi 
¼ 
1 
ð7Þ 
Where, 
X is the decision variable of the optimization problem 
n – newsubs is the number of new substations being added for the selection of one 
iv Keeping the existing substations – If the existing substation has enough useful life, 
it should be kept in service and must be selected as part of the model. This is 
achieved using the fact that the multiplication of the status variable of all existing 
substations in the distribution system must equal to unity. The formula for this 
constraint is as follows: 
nexisti 
Yngsubs 
i¼1 
Xi 
¼ 
1 
ð8Þ 
Where, 
X is the decision variable of the optimization problem 
n – existing subs the number of existing substations 
(b) Technical constraints: 
i Voltage Limits – Voltages of all system nodes must be within standard ranges 
between a minimum value and maximum value. 
Vmin Vi Vmax8i 
¼ 
1; 2; 3...; n 
ð9Þ 
Where, 
n is the total number of nodes not including source nodes 
Vmin and Vmax are the standard allowable voltage limits 
Vi is the node voltage 
322 H. Zaki et al.
ii Current thermal Limit 
Ii Imax8i 
¼ 
1; 2; 3...; m 
ð10Þ 
Where, 
m is the total number of line sections 
Imax are the line section conductor allowable current thermal limit 
Ii is the line section current flow 
iii Power Balance for each substation – By adding all power flowing out of a 
substation and comparing this power to the substation capacity, a power balance 
index can be formulated. This is usually achieved by performing a load flow and 
adding power flow in the ?rst section of each feeder emerging from each sub-station. 
The formula expressing this condition can be expressed as follows: 
Pnfeeders 
i¼1 
Power Flowi 
Substationcapacity 
[ 1......Powerflag 
¼ 
1 
\1......Powerflag 
¼ 
0 
¼ 
ð11Þ 
Where, 
n – feeders is the number of feeders emerging from a substation 
4 Solution Methodology 
The solution methodology of the DSE problem, modeled in this paper, starts by storing 
the values and parameters of the original system for comparison purposes. The 
methodology then proposes calculating the objective function. The solution method-ology 
then starts optimizing the system by ?nding the minimum objective function 
value subjected to the identi?ed constraints. Figure 3 shows the overview of the pro-posed 
solution methodology. 
The solution of the optimization problem was obtained using the MOGA. The GA 
is an older algorithm that appeared in the early 1990s [25]. GAs (Goldberg 1989) are 
search algorithms based on the principle of natural genetics and evolution. 
Figure 4 shows the flow chart of the GA based algorithm for solving the opti-mization 
problem. 
Economic and Performance Based Approach to DSE Planning Problem 323
The stopping criteria, mentioned in Fig. 4, determines when to stop the GA. This 
includes reaching maximum iterations, obtaining a solution that meets maximum tol-erance 
in comparison to the previous solution, reaching maximum number of popu-lation 
generations, etc. 
GAs have proven to be a useful approach to address a wide a variety of opti-mization 
problems. Being a population-based approach, GA is well suited to solve the 
multi-objective optimization problems. In this work, MOGA is applied to solve the 
proposed multi-objective, single representation DSE planning problem. 
Fig. 3. Overview of the solution methodology 
324 H. Zaki et al.
5 Case Studies 
Test cases have been performed to demonstrate the viability and effectiveness of the 
proposed model and obtained optimized solution. Two test cases were chosen and 
presented in this paper. These test cases are the 14-node and the 37-node test systems. 
The ?rst test case, which is a 14-node test system, is presented in details with deep 
analysis of its parameters and the obtained solutions. The proposed model was 
examined on this test case using two basic scenarios. The ?rst scenario, which is called 
Case (a), reflects a case with all line sections of outage durations less than 12 h. The 
second scenario, which is called Case (b), reflects a case where two-line sections were 
modi?ed to have more than 12 h outage durations. 
Fig. 4. Genetic algorithm flow chart 
Economic and Performance Based Approach to DSE Planning Problem 325
The second test case, which is a 37-node test case, is very similar to the ?rst one 
and, therefore it is only presented in brief with some discussion on its obtained results. 
This test case was modi?ed from the typical 37-node IEEE test case to reflect a 
balanced system as well the addition of a large DG connected directly to the existing 
substation. 
Both test cases obtained good results with clear improvement using the proposed 
combination of cost and performance parameters in the objective functions. 
5.1 Test Case I: The 14-Node Test System 
Several scenarios were used for testing the algorithm using the 14-node test system. 
The ?rst scenario provides parameters such that the SRI index is zero, which means 
there are all line sections will require less than 12 h to maintain in case of an outage. 
The parameters of this test case are presented in Table 1. Failure Rate and Duration 
of outages is a function of each line section age, installation quality, environment and 
erosion factors and location. These numbers are typical numbers for test purposes only 
and can be modi?ed as required. 
Table 1. 14-Node test case line parameters 
Line 
section 
no. 
From To Conductor 
size 
(AWG) 
Length 
(m) 
Original 
status 
Modi?ed 
status 
Failure rate 
(failure per 
year) 
Duration of 
outage 
(h/year) 
1 1 10 556.5 7290 1 1 1 0.2 
2 2 10 556.5 5180 1 1 1 0.3 
3 3 10 556.5 24,390 1 0 1 0.5 
4 3 11 556.5 700 0 1 1 8.5 
5 8 11 556.5 4530 0 1 1 0.7 
6 9 11 556.5 1625 0 1 1 0.9 
7 1 6 350 7320 1 0 2 1 
8 2 4 350 5260 1 1 5 1.5 
9 2 5 350 4770 1 1 2 0.6 
10 2 7 350 6250 0 1 3 0.4 
11 6 8 350 1890 1 1 7 7 
12 7 8 350 4630 1 0 4 0.9 
13 8 9 350 1000 1 0 2 0.3 
14 12 13 556.5 700 0 0 1 0.2 
15 3 12 556.5 725 0 0 1 0.6 
16 12 14 556.5 121 0 0 1 0.4 
17 5 13 350 3850 1 1 2 6 
18 7 14 350 5100 1 1 3 1.1 
326 H. Zaki et al.
The Existing and the proposed substations data of this system is shown in Table 2. 
The original 14-node test system and the optimized system are both shown in 
Fig. 5. While Case (a) presents the original system that required attention from the 
planner, Case (b) presents the proposed modi?ed system after applying the objective 
functions and all constraints. 
Figure 5, case (a) shows the original system which was a radial system supplied by 
substation 10. As a result of the load growth of the system, two feasible substations are 
proposed in two different locations, each with three emerging feeders to supply the load 
growth. It is required to select only one substation and determine the optimum system 
con?guration that minimizes the overall costs as well as the achievement of the best 
reliability indices. 
After running the proposed algorithm on this system using all constraints, the result 
becomes case (b) which proposes the transfer of 4 nodes from substation 10 to sub-station 
11. Substation 11 is the selected candidate and the system can now operate 
using the proposed con?guration. 
Table 2. Substations of 14-node test system 
Substation node 
ID 
Capacity 
(KVA) 
Capital cost 
($k) 
O&M annual costs 
($k) 
Existing/New 
10 1000 7500 100 Existing 
11 2500 4000 150 New 
12 1500 1200 60 New 
Fig. 5. 14 Node existing and modi?ed test systems 
Economic and Performance Based Approach to DSE Planning Problem 327
Table 3 represents the cost and performance values for both Cases presented in 
Fig. 5. It is obvious that in order to improve the system performance and change it from 
a fully radial system to an open loop system, there will be an increase in costs. The 
open loop system operates in a radial fashion with internal open line sections called ties 
to be used mainly during contingencies. The Cost have increased by approximately 1.5 
times, however, there is a signi?cant improvement in URI and SVI, which represent the 
system performance in this case. The reason SRI is showing zero values in this case is 
that none of the line sections parameters have been marked with a failure duration more 
than 12 h. 
As a courtesy of testing the system, another run has been made on the same test 
case after changing some of the line section failure durations to more than 12 h. For 
comparison purposes the result of this new modi?ed case is also presented in Table 3. 
Introducing SRI to the optimization process changes its result. In the same original 
test case, presented in Fig. 5, case (a), line Sections 5 and 6 durations of outages were 
increased to 13 and 15 h respectively. If case (b) was maintained, its SRI would have 
become 3.3 and the total objective function value would have been 18.6. 
The impact of this failure duration change was that the optimization algorithm 
chose substation 10 to be in-service as part of ful?lling the constraint to keeping the 
existing substation, and substation 12, instead of substation 11 of case (b), as part of the 
solution to the expansion problem. As such also line Sections 14 and 16 were rec-ommended 
to be in-service and the ?nal con?guration became as seen in Fig. 6. 
This shift in substation choice is logical as the algorithm tried to avoid supplying the 
system using the high failure duration Sections 5 and 6. SRI is still measuring zero 
because these two lines were avoided. 
Table 3. 14-Node test Case comparison of objective values 
Original 
system 
With failure durations 
less than 12 h 
With failure durations greater 
than 12 h on lines 5 and 6 
Case (a) Case (b) 
F 1.0312 1.5578 1.1894 
URI 19.6933 4.2031 4.956 
SRI 0 0 0 
SVI 11.0736 5.7909 6.3029 
COSTINDEX 31.7981 11.5518 12.4483 
328 H. Zaki et al.
It is expected that the objective function’s total value, of the new case, is higher 
than the case (b). However, the ?nal values are still much better than the original 
objective function value with substation 10 supplying the entire system. 
5.2 Test Case II: The 37-Node Test System 
Similar to the 14-node test system, a 37-node test system was also used and analyzed to 
test the proposed optimization solution. In this test system, there is one existing sub-station 
supplying the entire load of the system and there are three proposed substations 
in different locations and at different distances from the existing line sections. The 37- 
node test system is shown in Fig. 7. 
While the 14-node test system did not contain a DG connected to the system, the 
37-node test system has a DG connected directly to the existing substation tied to node 
4. The parameters of this 37-node test system are similar to the 14-node test system 
except with a larger quantity of substations and line sections. Proposed substations are 
numbered 38, 39 and 40 and they represent three different locations with three and four 
feeders as shown in Fig. 7. Due to the size of this test case and to avoid crowded 
?gures, only the original system is presented. 
Fig. 6. Modi?ed Network Supplied from Substations 10 and 12 after increasing durations of 
outages of line Sections 5 and 6. 
Economic and Performance Based Approach to DSE Planning Problem 329
The indices and the objective function value of the original and the optimized 
systems are shown in Table 4. While the SRI value remained almost the same, the cost 
(F) deteriorated and URI, SVI improved, hence improving the total objective function 
value. 
In this test case, line Sections 10 and 39 are assumed to have failure durations of 15 
and 18 h per failure per year. Since line Section 39 is one of the proposed substations 
main line to the system, the algorithm was able to avoid it in the optimization process 
by excluding substation 38 from the selection. Line 10, however, is on the pathway of 
all substations and hence it was elected in all options and is unavoidable when opti-mizing 
the system. In comparison to the original case, line 10 is also one of the main 
Fig. 7. The 37-node test system 
Table 4. 37-Node test Case comparison of objective values 
Original 
system case 
Optimized 
system case 
Line sections to 
be closed 
Line sections to 
be opened 
F 1.0475 1.3425 40, 41, 42 and 43 4, 9 and 23 
URI 6.0262 5.1416 
SRI 3.0495 3.0487 
SVI 5.2298 4.4474 
COSTINDEX 15.353 13.98 
330 H. Zaki et al.
components of the system and cannot be set to open. Therefore, the improvement in 
SRI was marginal as the algorithm searched for the lower value by avoiding other line 
sections with less number of customers due to its inability to change the failure duration 
and set this line section to open. 
6 Conclusion 
In this work a new model for the DSE problem was proposed. The new model included 
a combination of three performance indicators combined with the commonly used cost 
function as a multi-objective function. Seven constraints were used in the solution for 
the ?rst time. The new proposed model demonstrates its viability to arrive to an 
optimum solution considering the modern approaches of smart grids including per-formance 
when planning the expansion of distribution systems. After testing the model 
on two test systems with variable parameters it can be concluded that the model is a 
practical implementable model that proposes a solution suitable for ?nding a trade-off 
between cost and performance. The model can be easily applied in utilities and is 
recommended to be used by planners to help them make the best investment decisions. 
References 
1. Luong, N.H., Grond, M.O.W., La Poutre, H., Bosman, P.A.N.: Scalable and practical multi-objective 
distribution network expansion planning. In: IEEE Power and Energy Society 
General Meeting (2015) 
2. Vaziri, M., Tomsovic, K., Bose, A., Gonen, T.: Distribution expansion problem: formulation 
and practicality for a multistage globally optimal solution. In: IEEE, Power Engineering 
Society Winter Meeting (2001) 
3. Cossi, A.M., da Silva, L.G., La Zaro, R.A.R., Mantovani, J.R.S.: Primary power distribution 
systems planning taking into account reliability, operation and expansion costs. In: IEEE, 
The Institute of Engineering and Technology (IET) Generation, Transmission and 
Distribution, no. ISSN 1751-8687 (2011). https://doi.org/10.1049/iet-gtd.2010.0666 
4. de Souza, J., Rider, M.J., Mantovani, J.R.S.: Planning of distribution systems using mixed-integer 
linear programming models considering network reliability. J. Control Autom. Electr. 
Syst. (2015). https://doi.org/10.1007/s40313-014-0165-z 
5. Mazhari, S.M., Monsef, H., Romero, R.: A multi-objective distribution system expansion 
planning incorporating customer choices on reliability. IEEE Trans. Power Syst., 1330–1340 
(2015). https://doi.org/10.1109/TPWRS.2015.2430278 
6. Muñoz-Delgado, G., Contreras, J., Arroyo, J.M.: Reliability assessment for distribution 
optimization models: a non-simulation-based linear programming approach. In: IEEE, Power 
and Energy Society General Meeting (2017) 
7. IEEE std. 1366-2012 IEEE Guide for Electric Power Distribution Reliability Indices. IEEE 
Power and Energy Society (2013) 
8. Zare-Bahramabadi, M., Abbaspour, A., Fotuhi-Firuzabad, M., Moeini-Aghtaie, M.: 
Resilience-based framework for switch placement problem in power distribution systems. 
IET Gener. Transm. Distrib. 12(5), 1223–1230 (2018). https://doi.org/10.1049/iet-gtd.2017. 
0970 
Economic and Performance Based Approach to DSE Planning Problem 331
9. Chee-Wooi, T., Chen-Ching, L., Govindarasu, M.: Vulnerability assessment of cybersecurity 
for SCADA systems. IEEE Trans. Power Syst. 23(4), 1836–1846 (2008) 
10. Johansson, J.: Risk and vulnarability analysis of large-scale technical infrastructures. Ph.D. 
thesis, Media-Tryck, Lund University, Lund, Sweden, Lund, Sweden (2007) 
11. Chen, J., Peng, M., Gao, X., Li, G.: Multi-objective distribution network planning 
considering invulnerability. In: IEEE 2nd Information Technology, Networking, Electronic 
and Automation Control Conference (ITNEC), Chengdu, China (2017) 
12. Yang, X.-S.: Nature-Inspired Optimization Algorithms, Wlatham. Elsevier Inc., New York 
(2014) 
13. Ramírez-Rosado, I.J., Bernal-Agustín, J.L.: Genetic algorithms applied to the design of large 
power distribution systems. IEEE Trans. Power Syst. 13(2), 696–703 (1998) 
14. Yang, X.-S.: Nature-Inspired Optimization Algorithms. Elsevier Inc., New York (2014) 
15. Pereira Jr., B.R., Contreras, J., Mantovani, J.R.S., Cossi, A.M.: Multiobjective multistage 
distribution system planning using tabu search. In: IEEE, The Institute of Engineering and 
Technology (IET) Generation, Transmission and Distribution, no. ISSN 1751-8687 (2013). 
https://doi.org/10.1049/iet-gtd.2013.0115 
16. Coello, C.A.C.: An updated survey of GA-based multiobjective optimization techniques. 
ACM Comput. Surv. 32(2), 109–143 (2000) 
17. Turkay, B.: Distribution system planning using mixed integer programming. In: 
ELEKTRIK, Istanbul, Tubutak Emo, vol. 6, no. 1 (1998) 
18. Gonen, T., Ramirez-Rosado, I.J.: Optimal multi-stage planning of power distribution 
systems. IEEE Trans. Power Deliv., 512–519 (1987). https://doi.org/10.1109/TPWRD.1987. 
4308135 
19. Sindi, H., El-Saadany, E.: Uni?ed reliability index development for utility performance 
assessment. Intell. Ind. Syst. 2(2), 149–161 (2016) 
20. Aghaei, J., Muttaqi, K.M., Azizivahed, A., Gitizadeh, M.: Distribution expansion planning 
considering reliability and security of energy using modi?ed PSO algorithm. University of 
Wollongong Research online, Faculty of Engineering and Information Sciences papers, 
Wollongong, Australia (2014) 
21. Kumar, V., Krishan, R., Sood, Y.R.: Optimization of radial distribution networks using path 
search algorithm. Int. J. Electron. Electr. Eng. 1(3), 182–187 (2013) 
22. Abdelaziz, A.Y., Osama, R.A., El-Khodary, S.M.: Recon?guration of distribution systems 
for loss reduction using Hyper-Cube Ant Colony optimization algorithm. IET Gener. 
Transm. Distrib. 6(2), 176–187 (2012) 
23. Balakrishnan, R., Ranganathan, K.: A Textbook of Graph Theory, New York. Springer, 
New York (2013) 
24. Floyd, R.W.: Algorithm 97: shortest path. Mag. Commun. ACM 5(6), 345–350 (1962) 
25. Heidari, S., Fotuhi-Firuzabad, M., Kazemi, S.: Power distribution network expansion 
planning considering distribution automation. IEEE Trans. Power Syst. 30(3), 1261–1269 
(2015) 
332 H. Zaki et al.
Connecting to Smart Cities: Analyzing Energy 
Times Series to Visualize Monthly Electricity Peak 
Load in Residential Buildings 
Shamaila Iram1(?) , Terrence Fernando2 , and Richard Hill1 
1 
University of Hudderts?eld, Hudderts?eld, UK 
S.Iram@hud.ac.uk 
2 
University of Salford, Greater Manchester, UK 
Abstract. Rapidly growing energy consumption rate is considered an alarming 
threat to economic stability and environmental sustainability. There is an urgent 
need of proposing novel solutions to mitigate the drastic impact of increased 
energy demand in urban cities to improve energy e?ciency in smart buildings. It 
is commonly agreed that exploring, analyzing and visualizing energy consump- 
tion patterns in residential buildings can help to estimate their energy demands. 
Moreover, visualizing energy consumption patterns of residential buildings can 
also help to diagnose if there is any unpredictable increase in energy demand at 
a certain time period. However, visualizing and inferring energy consumption 
patterns from typical line graphs, bar charts, scatter plots is obsolete, less infor- 
mative and do not provide deep and signi?cant insight of the daily domestic 
demand of energy utilization. Moreover, these methods become less signi?cant 
when high temporal resolution is required. In this research work, advanced data 
exploratory and data analytics techniques are applied on energy time series. Data 
exploration results are presented in the form of heatmap. Heatmap provides a 
signi?cant insight of energy utilization behavior during di?erent times of the day. 
Heatmap results are articulated from three analytical perspectives; descriptive 
analysis, diagnostic analysis and contextual analysis. 
Keywords: Energy e?ciency · Smart buildings · Data analytics · Heatmap 
1 Introduction 
In recent years, energy data analytics has got tremendous attention of researchers, econ- 
omists, industrialists, and policy makers from all over the world. This could be because 
of the shortage of natural resources, environmental destruction, or proliferation of 
energy demand due to the development of urban cities. Confronted, with this rapid 
increase of energy demand, the researchers and scientists are ?nding greater interest to 
design and develop advanced techniques and methods that can help us to cope with 
energy crises or at least to mitigate its worst consequences. 
Moreover, the rapidly increasing energy consumption rate poses an alarming threat 
to the worldwide environmental sustainability and economic stability. International 
Energy Agency’s (IEA) statistics reveal that 32% of the total ?nal energy is being 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 333–342, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_26
consumed by the buildings [1]. This percentage is even higher in non-industrial areas. 
The fact that how people consumes energy depends on human behaviour and other 
social, economic, environmental and geographical factors [2]. 
In recent years, energy e?ciency and saving strategies have become a priority 
objective for energy policies due to the proliferation of energy consumption and CO2 
emission in the built environment. According to statistics 40% of all primary energy is 
being consumed in and by the buildings [3]. International Energy Agency (IEA) in [5] 
claims that “Energy e?ciency is a critical tool to relieve pressure on energy supply and 
it can also mitigate in part the competitive impacts of price disparities between regions”. 
Analyzing energy patterns and identifying variations in energy usage with the help 
of data mining techniques will help to build energy e?cient buildings. It is evident in 
the past 40 years that increasing energy e?ciency of the buildings helps not only to 
combat the climate changes but also to reduce the energy consumption [4]. 
Furthermore, this research work presents a framework that brings multi-domain 
knowledge to an interdisciplinary project to solve the unaddressed or partially addressed 
issue in the domain of energy e?cient smart buildings. In doing so, this research work 
elucidates the importance of mapping multi- domain experts’ opinion to develop the 
new policies in deploying the signi?cant changes. This new approach that combines 
social, economic, behavioural and psychological, environmental, statistical and compu- 
tational phenomena o?ers a dynamic and compelling framework for designing energy 
e?cient buildings. This research work also acts as a bridge to ?ll the communication 
gap between research community and the policy makers to make intelligent decisions 
based on scienti?c evidence. 
1.1 Times Series Analysis 
In time series analysis concern lies in forecasting a speci?c quantity given that the 
variations in that quantity over time are already known. While, other predictive models 
that do not involve time series mainly focus on analysing a cross-sectional area of the 
data which do not have time variance component. As stated by Hilda et al., in [6], “When 
a variable is measured sequentially in time over or at a ?xed interval (sampling interval) 
the resulting data represents a time series”. They further elaborated that time series is 
a collection of observations arranged in a natural order where each observation is asso- 
ciated with a particular instance or interval of time. 
More speci?cally, time series, compared to common data, holds natural temporal 
ordering where common data does not necessarily have natural ordering of the obser- 
vations. Furthermore, Millan et al. [7] de?ned time series analysis as a process of using 
statistical techniques to model and explain a time-dependent series of data points. 
Whereas, time series forecasting uses a prediction model to forecast the future events 
based on the past events. 
This research work also presents the application of di?erent kinds of analytical and 
visualization techniques to understand energy utilization patterns in residential building. 
Data analytical results are visualized in the form of heatmap. Heatmap results are articu- 
lated from three di?erent analytical perspectives as descriptive analysis, diagnostic 
analysis and contextual analysis. Rest of the paper is structured as: State of the art work 
334 S. Iram et al.
is presented in Sect. 2 followed by methodological framework in Sect. 3. Exploratory 
data analytical techniques are elaborated in Sect. 4; whereas Sect. 5 details the data that 
is used in this research work along with data preprocessing techniques. Application of 
heatmap examples are explained in Sect. 6. Section 7 provides brief summary of the 
work along with conclusion and future research work. 
2 Literature Survey 
Platchkov and Pollitt in their paper [8] critically analysed and overviewed the longer 
run trends of increasing global electricity demands and explain the potential impact in 
the UK electri?cation. They claimed that the underlying resources cost for the energy 
that is being used in di?erent times of the day or the year changes accordingly. For 
instance, on an o?-peak day the price per megawatt hour (MWh) in the power market 
does not rise above £50/MWh, however, on the peak day the price may reach to £800 
for half hour periods across a 24-h period. This implies that, for median days there is a 
comparatively great incentive of using electricity during night time. The main emphasis 
of their research work is that the demand will increase steadily over time but the possible 
coping solution is to shift the energy demand to o?-peak time. 
Therefore, a small demand response, either by reducing the consumption or by 
shifting it to the cheaper time can make a signi?cant di?erence in cost for residential as 
well as for commercial buildings. This shows the signi?cance of shifting demand to o?- 
peak time which is also called load balancing. Furthermore, ?guring out the factors that 
trigger the peak energy demand for a speci?c period of time in a building could poten- 
tially help to improve building’s heating, ventilation and air conditioning (HVAC) 
system. Together with this, sudden peak in energy consumption can be because of some 
mal-functioning or some unexceptional human behavior. Finding possible causes of high 
energy demand for a certain period of time can possibly lead to ?nd appropriate solutions 
for it and ultimately a control in energy demand. Understanding this demand and supply 
behavior in residential areas will further support the sustainable and renewable energy 
technology. 
David in his paper [2] states that selecting key variables and interactions is therefore 
an important step in achieving more accurate predictions, better interpretations, and 
identi?cations of key subgroups in the energy datasets for further analysis. Jenkins 
et al. [8] visualize energy data to examine the monthly demand of substations and 
synthesized equivalent. Walker and Pokoski [9] developed a model of residential electric 
load where they introduced the psychological factors based on a person’s availability 
that can a?ect the individual use of electrical appliances at a given time. Before that, in 
early nineties, Capasoo et al. [10] applied bottom up approach to develop “Capasoo 
Model”. This model uses the socioeconomic and demographic data, for instance, the 
stock of appliances and their usage pattern in a household to model a load curve. This 
load shape shows the relationship between the demand of residential customers and the 
psychological and behavioral factor of the house occupants. Later in 2002, [11] Willis 
used the bottom up approach to model the typical demand forecasting scheme for the 
individual customers. 
Connecting to Smart Cities 335
3 Methodological Framework 
The proposed methodological framework as shown in Fig. 1, for energy e?cient smart 
buildings, provides foundation for complex, diverse, contextually aware, eco-driven and 
intelligently monitored nature of energy demand that frequently requires a multi domain, 
interdisciplinary approach into research. This framework articulates the energy e?- 
ciency paradigm with respect to four signi?cant attributes that should be considered to 
improve end-use energy e?ciency and to reduce energy demand. The embedded features 
are predicated on the issues related to global climate change, social behavior, economic 
productivity, and modelling the exceptionally large energy datasets to explore and inter- 
pret the interesting, useful patterns of energy usage. 
Fig. 1. A methodological framework for cross disciplinary knowledge exchange to exploit the 
design and development of energy e?cient smart buildings. 
The ?rst crucial step to achieve a particular milestone is to identify and analyze the 
problems, issues and concerns of di?erent stakeholders in order to develop a shared 
vision with common understanding and clear targets. The most important factor that 
should be considered in constructing the smart buildings or smart cities is “human 
beings”, which means, everything that we construct should be human oriented. Creating 
a comprehensive roadmap will help us to focus on high-return predictive analytics with 
clear pre-de?ned destinations and achievable milestones which is a starting point for 
gaining a better understanding of customer’s requirements. 
Hence, as a part of this research work, one of the milestones is to classify the prereq- 
uisites to provide a foundation to develop a globally acceptable socio-technical strategy 
for building the smart buildings and smart cities. This will help to tackle all the issues 
that are in mutual interests of di?erent stakeholders. Since, this is a long term ongoing 
project, this ?rst part of the research work has already been accomplished and 
published [12]. 
336 S. Iram et al.
Our next research question is what is the role of data science in the design and 
development of energy e?cient smart buildings. In this research work, advanced analyt- 
ical methods and visualization techniques are used to explore complex energy datasets 
in order to understand energy consumption patterns of a residential building. 
4 Data Exploration: A Possible Solution 
Data could be explored, analyzed, visualized and described at different level of 
maturity. Most of the existing literature reveals four (4) informative levels of data 
exploration depending on the complexity of the case studies under question. These 
are recognized as descriptive analysis, diagnostic analysis, predictive analysis and 
prescriptive analysis [1]. 
However, what is mostly neglected in most of the case studies analysis is to under- 
stand the circumstances in which a particular thing has happened. This is usually called 
contextual awareness. Credibility of the results could only be attained by linking the 
outcome of a particular analysis with certain situation in which it occurs. We are recom- 
mending contextual analysis as complementary method to describe any analytical 
results. Therefore, data analytical types could be described from ?ve di?erent perspec- 
tives as listed in Table 1. 
Table 1. Data exploration types, description and examples 
Analytic Type Description Example 
Descriptive analysis What is happening? Historical data reports 
Diagnostic analysis Why did it happen? Fault Detection 
Predictive analytics What is likely to happen? Cost Prediction 
Prescriptive analysis What should we do about it Cost Optimization 
Context analysis In which circumstances this 
happened? 
Situation dependency 
As mentioned earlier, this research work aims to understand energy utilization 
patterns in a residential building to identify any unusual data behavior and their reasons. 
Hence, the analysis will be carried out from three di?erent perspectives as: 
• Understanding energy utilization patterns ?s Descriptive Analysis 
• Identifying extreme or abnormal data values ?r Diagnostic Analysis 
• Finding the root cause of normal and extreme behavior ?n Context Analysis 
5 Data Description 
For this preliminary research, data is collected for 32 di?erent houses in the area of 
Manchester in di?erent domains. In the domain of Building Information data is collected 
for Archetype of the buildings, their Age, Addresses as longitude and latitude, Class, 
Construction type, Ownership of the buildings, Floor area and Air test. Fifteen various 
kinds of architypes of the buildings were found in that area named as BISF, Brick and 
Connecting to Smart Cities 337
block, Detached 1980s brick and block, End terrace pre1919 solid wall, Flat wimpey-no-
?ness non-trad, Mid terrace pre 1919 solid wall, Semi-detached pre 1919 solid wall, 
Semi-detached 1919 solid wall, Semi-detached 1920s solid wall, Semi-detached 1930s 
solid wall, Semi-detached 1970s brick and block cavity, Semi-detached pre 1800 brick, 
Terraced pre 1919 solid wall and Wates. Age of the building is categorised as 1920s, 
1930s, 1950s, 1960s, 1970s, 1980s, pre 1800, pre 1919. Classes are de?ned as Detached, 
End-terraced, Flats, Mid-Terraced, Semi-detached. Construction type is recognized as 
Traditional and Non-traditional. Floor area is measured in square meters (m2 ) which is 
further classi?ed into three sections as Small (<50 m2 ), Medium (50–100 m2 ) and Large 
(>100 m2 ). Air permeability results for air leakage test are categorised into three sections 
as (<5 m3 /(m2 .h)), (5–10 m3 /(m2 .h)), (>10 m3 /(m2 .h)). 
Demographic Information that is collected in the domain of Human Information 
constitutes their Age, Gender, Family Composition and their Health Status. Family 
composition is further recognised as Single occupants, Working couples, Small family, 
Small family of three, Family of four, Family of ?ve, Family of six, Retired singles, 
Retired couples, Family of ?ve with retired couples, and short term occupants with 
complex needs. In the Services domain data is collected for electricity and gas usage in 
KWH/m2 for one complete year. Electricity data is clustered into three sections as (<35 
KWH/m2 ), (35–40 KWH/m2 ), and (>40 KWH/m2 ) whereas, gas data is also clustered 
into three sections as (<120 KWH/m2 ), (120–140 KWH/m2 ), and (>140 KWH/m2 ). 
5.1 Data Preprocessing 
To understand data distribution, to ?nd any outliers due to some extreme external 
behavior or malfunction in the sensor devices and to prepare data for analyzing and 
visualizing heatmap, energy dataset is preprocessed. At ?rst, Cumulative Distribution 
function (CDF) is applied on datasets to understand the probability of random variables 
in the datasets. Equations (1) and (2) represents the cumulative distribution function 
F(n) which is an estimate of the true CDF. It is found by making no assumptions about 
the underlying distribution. 
F(t) = P(X =( t) 
(1) 
Fn(t) = 
# of SampleValues =f t 
n 
(2) 
Figure 2(a) is the visual representation of CDF for temperature dataset for whole 
building over one month. This includes hallway, Lounge and bedrooms. However, 
Fig. 2(b) represents boxplot diagram to understand the extreme data behavior which is 
sometime because of some malfunction in the devices. 
338 S. Iram et al.
Fig. 2. (a) Cumulative distribution of dataset. (b) Outliers identi?cation with Boxplot diagram. 
Temperature dataset is collected for complete one year for all 40 buildings. However, 
to keep the analysis and visualization simple for this research work a dataset of one 
month (January) is selected for one residential building. Dataset is prepared by applying 
some functions from R1 packages such as lubridate, timeseries, and R classes POSIXct 
and POSIXlt. 
After discussions, it is decided to resample the datasets for di?erent timestamp to 
remove any suspicious or null value. Temperature dataset was collected after each ?ve 
seconds at ?rst for 24 h in a day for one year. However, to reduce the probability of any 
outliers, dataset was converted to each half an hour. This removed the probability of any 
extreme/malfunction data behavior that could a?ect the results. After that, heatmap 
algorithms are designed using R package ggplot2. Detail about heatmap application is 
articulated in the next section. 
6 Peak Identi?cation- Heatmap Example 
Once data is preprocessed and cleaned, the next step is to visualize energy utilization 
patterns of a residential building. For this, we selected a building where a working couple 
was living. The idea is to understand the usual behavior of energy utilization for each 
day of a month. Also, apart from identifying their energy exploitation behavior, the 
intention was to diagnose if there are any extreme or unusual data patterns that could 
also be identi?ed in the datasets. 
As explained earlier, R library ggplot2 is selected to design heatmap algorithm. 
Figure 3 provides visual representation of heatmap data values which are categorized 
from 0–2000 KWH and the color bar selected with dark blue, red and yellow colors 
where dark blue represents least data value and yellow represents extreme data value. 
Each data point in the heatmap presents a data value for half an hour which extends from 
1 
https://www.r-project.org/. 
Connecting to Smart Cities 339
0–24 h. However, y-axis represents each day of the moth. Heatmap will help us to 
perform descriptive, diagnostic as well as contextual analysis. 
Fig. 3. Heatmap example to diagnose regular and extreme data behavior for a residential building. 
As we can visualise in Fig. 3, there are some regular and some irregular energy 
utilisation patterns for each day in the whole month. As we can see in the ?gure, from 
11:00 PM to 7:00 AM the data values range comes within blue band, which identi?es 
low energy usage at that time which is highlighted as night time in the ?gure. Then from 
7:00 AM to around 11:30 AM there is comparatively higher usage of electricity which 
is probably due to the fact that everyone in the home is using electricity for normal house 
hold activities at that time of the day. This can be visualised as red colour squares in the 
?gure. Then during the day time, again there is not much activity at home as compare 
to the night time. This probably because they have left the house for work. Then, between 
time span 5:30 PM to 11:00 PM higher energy consumption could be visualised when 
usually everybody is at home and is engaged with di?erent activities at home. 
Moreover, this is also evident from the description above that by linking the descrip- 
tion of analytical results with its particular context actually helps to understand the 
reasons of least and higher electricity consumption at particular time of the day. 
Apart from a normal energy utilisation patterns, some extreme data behaviour could 
also be visualised in the heatmap. For instance, all yellow points in the map tell us some 
extreme or abnormal energy utilisation behaviour. This implies that there could be some 
abnormality in the devices integrated in the house or this could be because of some 
unusual behaviour of the residents. Identifying abnormal or extreme behaviour in energy 
consumption patterns is called diagnostic analysis of the data. This also implies that 
further investigation could be recommended to ?nd the root cause of such extreme 
behaviours that are the reasons of extreme energy utilisation. 
340 S. Iram et al.
7 Summary and Conclusion 
Increased energy demand in residential as well as in commercial buildings in recent 
years is deteriorating our natural energy resources and whole eco system. New and 
e?ective solutions are required to control higher rate of energy consumption in the 
buildings. This research work proposed a holistic multidisciplinary framework to 
exchange knowledge and understanding from di?erent domains for the design and 
development of sustainable energy e?cient buildings. This framework also presents the 
collaboration model to share knowledge among di?erent stakeholders and knowledge 
experts to implement e?ective policies that help to improve energy e?ciency. 
This research work focuses on exploring data science techniques to understand users’ 
energy consumption patterns in residential buildings. Electricity data is collected from 
32 di?erent residential buildings for one year. Raw data is visualized using Cumulative 
Distribution Function to understand its graphical distribution. However, boxplot 
diagrams are used to visualize outliers in the dataset. Dataset is re-sampled for di?erent 
timestamp to eliminate the probability of unwanted data values. Once data was prepro- 
cessed, heatmap algorithm is designed and implement to understand electricity 
consumption patterns for one residential building. 
Descriptive analytical method is used to elaborate the results of the heatmap. 
However, unusual or extreme energy utilization behavior is noticed in the energy 
consumption pattern and elaborated using diagnostic analytical method. Contextual 
analysis of the results helps to understand the rationale behind normal and unusual 
energy consumption patterns. Peaks were identi?ed in the heatmap that tell us some 
extreme behavior of energy consumption. This, sometimes, could be because of any 
fault in the integrated devices at home. However, this also recommends to understand 
residents own behavior to use energy at home. 
Energy analysis results reinforce our statement that ?guring out the factors that 
trigger the peak energy demand for a speci?c period of time in a building could poten- 
tially help to improve building’s heating, ventilation and air conditioning (HVAC) 
system. Together with this, sudden peak in energy consumption can be because of some 
mal-functioning or some unexceptional human behavior. Finding the possible causes of 
high energy demand for a certain period of time can possibly leads to ?nd appropriate 
solutions for it and ultimately a control in energy demand. Understanding this demand 
and supply behavior in residential areas will further support the sustainable and renew- 
able energy technology. 
As part of future research work, authors intend to explore di?erent data analytical 
techniques that could be used to analyze stakeholders’ requirements that they want to 
be integrated in smart buildings. 
References 
1. Fan, C., Xiao, F., Wang, S.: Development of prediction models for next-day building energy 
consumption and peak power demand using data mining techniques. Appl. Energy 127, 1– 
10 (2014) 
Connecting to Smart Cities 341
2. Hsu, D.: Identifying key variables and interactions in statistical models of building energy 
consumption using regularization. Energy 83, 144–155 (2015) 
3. Pérez-Lombard, L., Ortiz, J., Pout, C.: A review on buildings energy consumption 
information. Energy Buildings 40(3), 394–398 (2008) 
4. Pacala, S., Socolow, R.: Stabilization wedges: solving the climate problem for the next 50 
years with current technologies. Science 305(5686), 968–972 (2004) 
5. Internation Energy Agency (IEA), World Energy Outlook 2015, OECD/IEA, Editor, Paris 
(2014) 
6. Kosorus, H., Honigl, J., Kung, J.: Using R, WEKA and RapidMiner in time series analysis 
of sensor data for structural health monitoring. In: 22nd International Workshop on Database 
and Expert Systems Applications (DEXA), pp. 306–310. 29 Aug.-2 Sept., IEEE, France 
(2011) 
7. Millan, P., et al.: Time series analysis to predict link quality of wireless community networks. 
Comput. Netw. 93(2), 342–358 (2015) 
8. Platchkov, L.M., Pollitt M.G.: The Economics of Energy (and Electricity) Demand 
Cambridge University, 13–14 May 2011 
9. Walker, C.F., Pokoski, J.L.: Residential load shape modelling based on customer behavior. 
IEEE Trans. Power Appar. Syst. 104(7), 1703–1711 (1985) 
10. Capasso, A., et al.: A bottom-up approach to residential load modeling. IEEE Trans. Power 
Syst. 9(2), 957–964 (1994) 
11. Willis, H.L.: Spatial Electric Load Forecasting, 2nd edn. CRC Press, New York (2002) 
12. Iram, S., Fernando, T., Bassanino, M.: Exploring cross-domain data dependencies for smart 
homes to improve energy e?ciency. In: Companion Proceedings of the 10th International 
Conference on Utility and Cloud Computing, pp. 221–226. ACM, USA (2017) 
342 S. Iram et al.
Anomaly Detection in Q & A Based 
Social Networks 
Neda Soltani1(&) , Elham Hormizi2 , 
and S. Alireza Hashemi Golpayegani1 
1 
Computer and IT Engineering Department, 
Amirkabir University of Technology, Tehran, Iran 
{neda.soltani,sa.hashemi}@aut.ac.ir 
2 
Computer and IT Engineering Department, 
University of Science and Technology, Babol, Mazandaran, Iran 
elham.hormozi@gmail.com 
Abstract. Detection of anomalies in question/answer based social networks is 
important in terms of ?nding the best answers and removing unrelated posts. 
These networks are usually based on users’ posts and comments, and the best 
answer is selected based on the ratings by the users. The problem with the 
scoring systems is that users might collude in rating unrelated posts or boost 
their reputation. Also, some malicious users might spam the discussion. In this 
paper, we propose a network analysis method based on network structure and 
node property for exploring and detecting these anomalies. 
Keywords: Anomaly detection
.n
Social networks
.n
Q & A 
Reputation boosting
.n
Spam detection 
1 Introduction 
Widespread participation in question and answer sites and answering specialized 
questions, has led to the creation of massive data collections that are growing rapidly. 
On the other hand, it’s hard to detect related, correct, and non-spam responses. In order 
to identify spam, misleading or irrelevant answers that are replied to a question or 
discussion, it is necessary to analyze these responses. Besides natural-language analysis 
methods that have many complexities, some of these anomalies can be identi?ed based 
on the structure of communication between individuals and the content of the posts. For 
instance, authors of [1] state that spammers create star-like sub-networks. 
Anomaly means deviation from expected behavior. This means there exists patterns 
in observed data that do not match the de?nition of normal behavior. In social net-works, 
anomalies mean interactive patterns that have signi?cant differences from the 
whole network. In fact, the de?nition of anomaly depends on the nature of the problem. 
Various types of anomalies could be de?ned in social network environments, 
depending on the network of question. For example, spam emails are known as 
anomaly. In a network-based trust system, collusion is identi?ed as another type of 
anomaly. These are just examples of anomaly types in network structures. Considering 
the total amount of resources, time, and cost spent on these anomalies, it is necessary to 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 343–358, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_27
develop solutions to this issue. According to statistics, 67% of email traf?c within the 
period of January to June 2014 was spam. Also, in 82% of cases, social networks were 
used for online abuse. These examples indicate the importance of the issue. These 
anomalies appear as abrupt changes in interactions or interaction, which are completely 
different from the usual form in a particular network. For instance, subnets that are 
created for collusion have certain forms of interaction. Another symptom of anomalies 
is highly interconnected subnets or star-like structures. Solutions that have been pro-posed 
to detect anomalies in social networks are in two categories: 
• Checking and comparing the network model with a normal interaction model. 
• Checking network attributes. 
Therefore, detection of anomalies in social networks involves the selection and 
calculation of network characteristics, and classi?cation and observation in the char-acteristics 
space. The ?rst challenge is the de?nition of normal behavior. Social net-works 
do not have a ?xed and balanced structure in all components due to the diversity 
of individuals and available nodes; and the de?nition of a normal structure in such 
networks is not possible. Another issue is that distributes of node degrees and network 
structure of communities changes over time. The scenarios presented for a normal 
structure are not necessarily real-time and it’s possible for a network to change before 
structure is extracted. Anomaly detection includes the following steps [1]: 
(1) Determining the smallest affected unit by behavior. 
(2) Identifying characteristics that are different from normal states. 
(3) Determining the context. 
(4) Calculation of characteristics and extracting a characteristic space. 
(5) Calculation of the distance between observations. 
The difference between anomaly detection in social networks and other areas is that 
in social networks we have individuals –containing characteristics—and the relation-ships 
between them—, which are relevant to their characteristics. Networks may be 
static or dynamic, labeled or not, and local or global; all of which affect the de?nitions 
in the network, and also a de?nition of anomalies. Therefore, the method used for 
anomaly detection in a friendship social network does not necessarily have optimal 
result in an authors’ network. 
In this paper, we will use social network analysis methods to detect anomalies in 
content sent by users in a question and answer based social network. To achieve this 
goal we have to ?rst de?ne the anomaly type; and second, present the detection method 
based on the network and anomaly properties. Then, we will use network analysis 
methods to use the presented method on the selected network. The main contribution of 
this paper is using node properties along with graph structure for detecting anomalies. 
The remainder of the paper is organized as follows: In the next section literature 
review is throughout the recent works in this area. In Sect. 3, the problem statement is 
presented in details. Then, our proposed solution methodology is explained. Section 4 
covers the experiments and results of our tests and ?nally, in Sect. 5 we conclude our 
work and discuss future works. 
344 N. Soltani et al.
2 Related Work 
The types of anomalies in terms of the anomaly detection are in the following cate-gories 
[1]: Static unlabeled anomalies, Static labeled anomalies, Dynamic unlabeled 
anomaly, and Dynamic labeled anomaly. 
Detection of anomalies is critical in preventing malicious activities such as bully-ing, 
designing terrorist attacks and disseminating counterfeit information. The authors 
of [2] examined the work that has been done to detect anomalies in social networks and 
focus on the effects of new anomalies in social media and most new techniques to 
identify speci?c types of anomalies. There are also a variety of studies on the detection 
of anomalies, data types and data attributes in the social network, anomalies are 
detected in network data [3–5, 8], which focus on graph data, including data weights to 
detect anomalies. An “ego-nets” is provided that includes sub-graphs of favorite nodes 
and neighboring nodes, and an “oddball” sphere regards around each node at the 
substrate of the adjacent nodes that exists to each node. Then, a small list of numerical 
features is designed for it. Detection of anomalies in temporary data has been done by 
[7, 9, 10]. The key idea is to create a Granger graphical model on a reference data set, 
and using a series of restrictions on the existing model, assuming that there is time 
dependence as reference data, they test the determined dataset and also speed up 
detection of anomalies by several random and parallel optimization algorithms. The 
proposed methods in the referred papers cause the effectiveness of accuracy and 
stability. 
In [11], the author discusses about advances in detecting fraud and malformation 
for social network data, including point anomaly detection. In that, a taxi driving fraud 
detection system was used. To implement the system, there are a large number of GPS 
trackers for 500 taxi drivers and systematically, they have investigated counterfeit 
activities of taxi drivers. The author in [12] uses an algorithm called WSAR3E.0 that 
can detect anomalies in simulated data with the earliest possible detection time and a 
low false positive number. It is also discussed in some articles about the detection of 
group malformations in social networks, applications, and systems. 
In [13], in order to identify the social implicit relations and close entities in the 
dataset, a framework has been used to solve similar unusual users in the real-world 
datasets. This approach requires a model for coping of communications, a model for 
independent users, and a method for distinguishing between them. 
In [14], a graphical model called GLAD, which has the ability to discover the group 
structure of social networks and detect group anomalies and also, required tests are 
performed on real and unrealistic datasets by anomaly injections. This automatically 
checks the nodes of a multi-layer network based on the degree of similarity of the 
nodes to the stars in different layers and by parallelizing the extracted features and 
anomalous detection operations in different layers of the multi-layer network, signi?- 
cantly, the calculations have been increased by the distribution of inputs to different 
machines cores. In [16], the author analyzes the distribution of input times and the 
volume of events such as comments and displays of online surveys for ranking and 
detecting suspicious users, such as spammers, bots and Internet fraudsters are being 
Anomaly Detection in Q & A Based Social Networks 345
used. In this paper, a relative model called VOLTIME is presented that measures the 
distribution of input times from real users. 
In another research-based on the idea that most user behavior is divergent from 
what can be considered as ‘normal behavior’, there is a risk assessment that results in 
more risks [17]. Because similar users follow a series of similar rules on social net-works, 
this assessment is organized in two phases: Similar users are ?rst grouped 
together, then, for each identi?ed group, one or more models are constructed for their 
normal behavior [18]. Using the recorded sessions to solve the problem of whether 
each session is abnormal determines the degree of anomalies in each session. Imple-menting 
robust statistical analyzes on such data is very challenging as the number of 
observed sessions is much smaller than the number of network users. The new method 
being forwarded in this paper for detecting anomalies in a very large dimension based 
on hyper-graphs, an important extension of graphs in which simultaneously the edges 
connect to more than two vertices. Table 1 shows a comparison between abovemen-tioned 
researches. 
3 Problem Statement and Solution Methodology 
As mentioned in introduction part, we are looking for anomalies in this dataset. We 
limit anomaly types to spam and reputation sub-networks. Therefore, following 
questions are to be answered in the database: 
1. Which users submit answers irrelevant to the question, spam, or aim at misleads the 
discussion? 
2. Which users boost reputation on a mendacious basis? 
We have ignored comments for some reasons; ?rst, we want to keep track of the 
discussion, which is mainly included in the posts not comments. Second, it would be a 
time-consuming task to merge the comments to the posts, as the dataset is provided 
separately for comments. Furthermore, comments are written in response to a single 
post and mostly contain details about that post, not the whole question. Finally, rating 
and badges are based on posts, not comments. So, the speci?c types of anomaly we are 
looking for would be found in posts. 
3.1 Methodology 
In this section, we present our analysis made on the proposed network. The analyses 
aim at detecting spammer accounts, and as a result, the spam answers. 
Based on [4, 6], spammers create a star-like network. So, we ?rst detect star-like 
sub-networks. To do so we have to create ego-net for each individual node and then 
study the neighbor nodes. A star-like sub-network is detected if there are few neighbors 
who connect directly to one another. The node in the center of a star-like sub-network 
is a spammer by a high possibility. 
The other question mentioned in the previous section is about detecting the nodes, 
which try to falsely boost their reputation. This is done by detecting communities 
whose intercommunications are too much tight [19]. 
346 N. Soltani et al.
Finding Star-Like Structure. In order to detect star-like structures, we have to 
detect cliques of size 3, i.e. triads in ego network of each node. In order to study ego 
networks, we choose the nodes with the highest betweenness; as these nodes connect 
components of the network to each other, are likely to create star-like structure. Fig-ure 
1 shows a pseudo code of the algorithm proposed for detecting star-like ego-networks 
in this paper. 
Detecting Highly Interconnected Communities. Another type of anomaly con-sidered 
in this paper is collusion in order to boost reputation. Based on [1, 19], this type 
of anomaly is detected by detecting highly interconnected communities. Communities 
having this property are almost isolated from the whole network and have a large 
Table 1. Comparison between recent researches on Social Networks Anomaly Detection. 
Reference Anomaly 
type 
Target 
network 
Method Node/Edge 
Property Included 
[3–5, 8] Anomalies 
Nodes 
Weight Graph OddBall, ego-net 
Patterns, Hybrid 
Method for Outlier 
Node Detection 
Density, Weights, 
Ranks and 
Eigenvalues, use 
Node and Edge 
[7, 9, 10] Time-Series 
Anomaly 
Detection 
Weight Graph Granger graphical 
model 
Edge, Weight 
[11] Point 
Anomaly 
Detection 
Weight Graph Taxi Driving Fraud 
Detection System 
Edge, Weight 
[12] Bayesian 
Network 
Anomaly 
Detection 
Bayesian 
Network 
WSAR3E.0 
Algorithm, 
Simulation 
Edge, Time 
[13] Intrusion 
Detection 
Graph 
Network 
Tribes algorithm Node 
[14] Group 
anomaly 
Detection 
Graph 
Network 
Group Latent 
Anomaly Detection 
(GLAD) model, 
d-GLAD 
Node, weight 
[15] Multilayer 
Networks 
Unsupervised, 
Parameter-
Free, and 
Network 
ADOMS (Anomaly 
Detection On 
Multilayer Social 
networks 
Node, Edge, 
Weight 
[16] Suspicious 
Users 
Unsupervised 
Anomaly 
Detection 
VOLTIME Model Time 
[17] User 
Anomalous 
Behaviors 
Online Social 
Networks 
Two-Phase Risk 
Assessment 
Approach 
Time, Node 
[18] Anomaly 
Detection 
Weighted 
graphs 
OddBall Algorithm Node, Density, 
Weights, Ranks 
Anomaly Detection in Q & A Based Social Networks 347
number of edges inside. While ?nding this type of community, edge weights get 
important. In the ?rst scenario we used to create the network, we did not consider the 
edge weights. In order to add weight to edges, in such a way that it shows the level of 
two nodes’ connectivity, we add the number of times one node answers another node’s 
question as the edge weight between those nodes. 
Considering the nature of the anomaly we want to detect, we can omit edge 
directions; as we are looking for high interconnectedness. We assume that these sub-networks 
contain malicious users who try to boost their own reputation by asking or 
answering another’s questions. Communities are detected by identifying isolated 
components of the network (Fig. 2). 
Fig. 1. Pseudo code for the proposed algorithm 
Fig. 2. Pseudo code for the algorithm we presented for detecting anomalous communities. 
348 N. Soltani et al.
4 Experiments and Results 
4.1 Dataset Speci?cations 
The dataset has been downloaded from the Stack Exchange site and includes questions 
about the “Android” category on this site. This dataset contains user information, 
badges, comments posted below posts, questions and answers, history of post changes, 
posts links, and registered votes for each post. Each of this information is in a separate 
XML ?le [18]. On the Stack Exchange site, they do the control mechanism for posting 
and controlling the users. Each post gets a negative or positive rating from users. 
According to posts, people give each other a badge. Also, people’s reputation is based 
on their posts, the number of correct answers set by the rest of the users, and so on. To 
work with this dataset, we ?rst enter the information in the Excel environment and save 
the sections in the CSV ?le format. In the following, in order for the data to be able to 
enter the Pajek software, using a Java program, read the ?les and save the nodes and 
edges in separate ?les. 
Network Creation Scenarios 
One method to detect spam is detecting spammer accounts. Therefore, if we create 
a network of users and analyze it in order to ?nd the spammer accounts, we could 
simply flag posts by those accounts as spam. Obviously, we won’t be able to detect 
spam sent by normal users. 
In the aforementioned network nodes are users. Each edge resembles a reply by a 
user to another user’s post. Therefore, an edge connecting user u1 to user u2 shows that 
user u1 has answered user u2’s one question. Edges are directed (from u1 towards u2). 
Therefore, a user having high in-degree in one who has answered questions by many 
users, and a user having a high out-degree is one who has answered questions of many 
users. The latter users are more important to us now, as we consider spam answers. 
Nodes have properties including id, reputation, account creation date, name, age, 
positive votes count, negative votes count, and badges. We would use these properties 
to detect spammer users. 
A large number of users are solitary; i.e. there are a large number of users who have 
not asked questions or answered any other questions. We remove solitary nodes, which 
results in the network illustrated in Fig. 3. 
The network created from users based on answers of each user to the other user’s 
question. The network has several separate components. In a plenty of cases the user 
has asked only one question, answered by only one user, none of whom interact with 
the rest of the users. 
In the following section, we will explain implementation of our proposed solution. 
There are plenty of visualizations of resulting network, which represents nodes as small 
circles (each of which is representative of a user either answering a question or asking 
one). A connection between two nodes shows an answer from one user to the other’s 
question. 
Anomaly Detection in Q & A Based Social Networks 349
4.2 Implementation 
Detecting Star-Like Ego-Net. In order to ?nd the possible spammer accounts, we 
choose the nodes based on betweenness and examine those nodes ?rst. The ?rst 
experiment is done on user 137 who has the most betweenness. Figure 4 shows 
neighbor network, Fig. 5 shows the ego-net of node 137 and Fig. 6 shows the triads of 
the network in Fig. 4. 
50 nodes of total 105 nodes create a neighbor network with 137. Therefore, the ego 
network of 137 is not a star like structure as more than 70% of its neighbors are 
connected to each other. Table 2 shows the properties of node 137 which is used to 
decide if anything abnormal exists about this node. 
The next node in the highest betweenness order is 16575. Figures 7 and 8 show the 
ego-net and neighbor network of this node respectively. There are 502 nodes in 16575 
neighborhoods, but only 135 of them are connected to each other. In order to analyze 
this node further, we check its properties as follows (Table 3). Considering upvote 
Fig. 3. Network created based on scenario. 
Fig. 4. Neighbor network of user 137. 
350 N. Soltani et al.
count of this node compared to its downvote, high reputation, and 79 badges, it is 
unlikely for this node to be a spammer. Although, the ego network of this user is quite 
close to star structure. 
The third experiment is done on user 1465. 110 nodes out of 272 nodes in 1465’s 
neighborhood are connected to each other (45%). Considering this node’s properties, 
we can see it has a high reputation, but the downvotes outnumber the upvotes. It is 
possible that 1465 is a spammer user (Figs. 9 and 10). Considering other properties of 
this node, we can see this user has had 1012 posts with an average rating of 3.32, 
average view of 20500, the average answer to questions of 1.33, and average comments 
Fig. 5. Ego network of node 137. 
Fig. 6. Triads of neighbor network of node 137. 
Table 2. Node 137 properties. 
ID Reputation CreationDate DisplayName UpVotes DownVotes Age Cb 
137 14905 2010-09- 
14T02:48:38.087 
Matt 1236 18 0.0040 
Anomaly Detection in Q & A Based Social Networks 351
on posts of 1.42. We compare these numbers to the overall average values (Table 4). 
It is seen that average values for user 1465 is above, or almost equal to overall values; 
based on which we conclude user 1465 is not a spammer, despite the prior guess. 
Other nodes having a high betweenness are studied the same way. 
Detecting Communities. Communities are detected by identifying isolated compo-nents 
of the network. We omit components having less than 4 nodes. The result is 
shown in Fig. 11. We consider the biggest component by detecting communities in it 
and removing the edges, which connect communities to each other (Fig. 12). 
In order to detect highly interconnected communities, each community is studied 
solo. For each community, we study the degree distribution, the most central node, and 
the reputation average of the community. As seen in Fig. 13, the sub-network has star-
Fig. 7. Ego network of 16575. 
Fig. 8. Neighbor network of 16575. 
Table 3. Properties of node 16575 
ID Reputation CreationDate DisplayName UpVotes DownVotes Cb 
16575 45479 2012-07- 
02T20:06:13.047 
Izzy 1452 213 0.0034 
352 N. Soltani et al.
like structure and is not highly interconnected. The most central node has the following 
properties (Table 5). 
This user’s reputation is higher than the total average reputation. Nothing is 
anomalous about this node so we move on to the next community. 
One of the communities does not have star-like structure (which makes it possible 
to be interconnected – Fig. 14). The biggest clique in it is as represented in Fig. 15. 
All the nodes in Table 6 were created within two weeks. Most of them have a high 
reputation, and their up-votes are much bigger than their downvotes. The clique created 
Fig. 9. Ego network of 1465. 
Fig. 10. Neighbor network of 1465. 
Table 4. Properties of node 1465 compared to the overall average 
Average Score ViewCount AnswerCount CommentCount FavoriteCount 
All data 1.75 2937.04 1.175 1.226 1.655 
1465 3.32 20500.61 1.33 1.42 5.762 
Anomaly Detection in Q & A Based Social Networks 353
Fig. 11. Communities in the network. 
Fig. 12. Communities inside the biggest component of the network after removing components 
having less than 4 nodes and edges between components 
Fig. 13. Community with the highest number of nodes. 
354 N. Soltani et al.
in the aforementioned community is possible an anomaly because of it resembles a 
highly interconnected subnetwork. Given that other communities have a similar 
structure, this structure is abnormal. 
The reason behind the fact that most communities have star-like structure is that 
experts in each ?eld answer questions of their own expertise and rarely answer the 
question in all ?elds. Therefore, most users have asked few questions and these 
questions have been answered by few numbers of experts in that speci?c ?eld, who are 
at the center of the stars. 
For this community having a different structure, there could be two hypotheses: 
(1) there exist a number of experts that communicate to one another and rarely answer 
questions of other users, or (2) there are users in it who have joined the network in 
order to get badges and boost reputation. Considering the creation time of the users in 
this clique, the second hypothesis is further reinforced. 
Table 5. Properties of node 40036 
ID Reputation CreationDate DisplayName UpVotes DownVotes Age 
40036 3705 2013-08-25T09:42:20.677 RossC 913 885 
Fig. 14. A community, which is not star like. 
Fig. 15. Biggest clique. 
Anomaly Detection in Q & A Based Social Networks 355
Other communities exist that have structures different from star-like sub-network. 
Figure 16 shows them: 
Table 6. 10 highest degree centrality nodes 
ID Reputation CreationDate DisplayName UpVotes DownVotes Age Degree 
137 14905 2010-09-14 Matt 1236 18 70 
10 18945 2010-09-13 Bryan Denny 1481 30 29 65 
482 15609 2010-09-27 Lie Ryan 3591 141 56 
15 4856 2010-09-13 gary 1498 44 31 
594 3820 2010-10-02 Edelcom 376 2 54 23 
366 915 2010-09-22 Casebash 154 1 28 18 
86 2168 2010-09-13 FoleyIsGood 165 1 33 17 
382 1804 2010-09-22 BrianCooksey 119 0 49 16 
7 1687 2010-09-13 Jonas 78 17 16 
280 520 2010-09-21 Radek 159 0 15 
Fig. 16. Other communities with star structure. 
356 N. Soltani et al.
5 Conclusion and Discussion 
In this paper, we presented a solution to detect anomalies in social networks. We 
focused on a famous QA network; therefore, the anomalies were de?ned as inappro-priate 
answers (e.g. spam) and false reputation boosting. In order to detect these two 
types of anomalies, we suggested and applied two different approaches. For detecting 
spammers, we used a methodology to detect star like ego networks, and for detecting 
false reputation boosting, we detected highly interconnected networks. As another 
contribution of this paper, we considered network structure and node properties at the 
same time, which helps to get results that are more accurate. 
Detecting anomalies in social networks highly depend on the type, structure, and 
the content of the network. Based on the type of anomaly to be detected, different 
network scenarios exist. Also based on the network creation scenario, the solution will 
be different. All of which makes it impossible to present a general-purpose anomaly 
detection method. 
The limitations of the research include the challenges of combining network 
analysis results with mining results on node properties. As seen in this paper, we 
analyzed nodes after ?nding the most probable abnormal node using network solutions. 
Yet, there is not a unique systematic solution to this. 
As the future path for this research, one can consider the following: 
• Analysis and detection of other possible types of anomalies in a typical Q & A 
social network, such as spurious expertise, irrelevant answers, offensive comments, 
etc. 
• Extension of research to user feedback-based areas like product overview, discus-sion 
forums, and social groups; each of which is potentially ?t for spam and 
reputation boosting. 
• Implementing different network generation scenarios; e.g. a weighted graph of users 
based on the number of interactions between two users, a second layer network 
generated based on keywords of users and questions. These scenarios might help 
better in detecting abnormal behavior within the current context. 
References 
1. Savage, D., Zhang, X., Yu, X., Chou, P., Wang, Q.: Anomaly detection in online social 
networks. Soc. Netw. 39, 62–70 (2014) 
2. Liu, Y., Chawla, S.: Social media anomaly detection: challenges and solutions. In: 
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 
pp. 817–818. ACM, Cambridge (2017) 
3. Akoglu, L., McGlohon, M.: Anomaly detection in large graphs. CMU-CS-09-173 Technical 
Report (2009) 
4. Akoglu, L., McGlohon, M., Faloutsos, C.: Oddball: spotting anomalies in weighted graphs. 
In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) Advances in Knowledge Discovery 
and Data Mining. PAKDD 2010. LNCS, vol. 6119. Springer, Berlin (2010) 
Anomaly Detection in Q & A Based Social Networks 357
5. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. 
IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012) 
6. Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly 
detection in bipartite graphs. In: Fifth IEEE International Conference on Data Mining, 
pp. 418–425. IEEE Computer Society, Washington, DC (2005) 
7. Cheng, H., Tan, P.N., Potter, C., Klooster, S.: Detection and characterization of anomalies in 
multivariate time series. In: Proceedings 
8. Tong, H., Lin, C.-Y.: Non-negative residual matrix factorization with application to graph 
anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data 
Mining, pp. 143–153. Society for Industrial and Applied Mathematics (2011) 
9. Qiu, H., Liu, Y., Subrahmanya, N.A., Li, W.: Granger causality for time-series anomaly 
detection. In: IEEE 12th International Conference on Data Mining (ICDM), pp. 1074–1079. 
IEEE (2012) 
10. Sun, P., Chawla, S., Arunasalam, B.: Mining for outliers in sequential databases. In: 
Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 94–105. 
Society for Industrial and Applied Mathematics (2006) 
11. Ge, Y., Xiong, H., Liu, C., Zhou, Z.H.: A taxi driving fraud detection system. In: 2011 IEEE 
11th International Conference on Data Mining (ICDM), pp. 181–190. IEEE (2011) 
12. Wong, W.K., Moore, A.W., Cooper, G.F., Wagner, M.M.: Bayesian network anomaly 
pattern detection for disease outbreaks. In: Proceedings of the 20th International Conference 
on Machine Learning (ICML-03), pp. 808–815. IEEE (2003) 
13. Friedland, L., Jensen, D.: Finding tribes: identifying close-knit individuals from employment 
patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge 
discovery and data mining, pp. 290–299. ACM, Vancouver, August 2007 
14. Yu, R., He, X., Liu, Y.: Glad: group anomaly detection in social media analysis. ACM 
Trans. Knowl. Discov. Data (TKDD) 10(2), 18 (2015) 
15. Bindu, P.V., Thilagam, P.S., Ahuja, D.: Discovering suspicious behavior in multilayer social 
networks. Comput. Hum. Behav. 73, 568–582 (2017) 
16. Chino, D.Y., Costa, A.F., Traina, A.J., Faloutsos, C.: VolTime: unsupervised anomaly 
detection on users’ online activity volume. In: Proceedings of the 2017 SIAM International 
Conference on Data Mining, pp. 108–116. Society for Industrial and Applied Mathematics 
(2017) 
17. Laleh, N., Carminati, B., Ferrari, E.: Risk assessment in social networks based on user 
anomalous behaviour. IEEE Trans. Dependable Secure Comput. (2016) 
18. Stack Exchange Data Dump. https://archive.org/details/stackexchange. Accessed 9 Nov 
2017 
19. Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for 
fraud detection in online auction networks. In: Proceedings of the 16th International 
Conference on World Wide Web, pp. 201–210. ACM (2007) 
358 N. Soltani et al.
A Study of Measurement of Audience 
in Social Networks 
Mohammed Al-Maitah(?) 
Computer Science Department, Community College, King Saud University, Riyadh, Saudi Arabia 
malmaitah@ksu.edu.sa 
Abstract. This article is dedicated to surveying and analyzing Facebook account 
performance and developing a set of indicators, which can describe audience of 
Facebook user. The raw experimental data was gathered and analyzed using stat- 
istical methods, developed initially for Twitter. Based on them audience was 
classi?ed into categories then main attributes of updates was carefully studied to 
develop derived indicators which can show not only audience quality, but also 
information coverage and partly in?uence (e.g. growth of authority and so on) 
and demonstrated using graphical charts. Indicators were generalized into 
formulae—so was built a base to further studies on Facebook account activity. 
Directions of future work are also listed in conclusion. 
Keywords: Social network · Performance · Facebook · In?uence 
Account survey 
1 Introduction 
Facebook engine provides a very small number of attributes to analyze. The most posts 
are attributed by quantity of “likes” (e.g. number of people, who marked certain post) 
and “shares” (e.g. number of people, who also placed certain post on his page). These 
two attributes are not interdependent. This means that user can mark but not share, 
likewise he can share but not mark. But even in such simple estimation system there’s 
a set of di?culties. Firstly, there’s no way to determine, whether “like” demonstrates 
exactly “like”. There are a number of events, which are marked, but not really liked by 
the users. For example, it can be message about someone’s death or other sad news [1]. 
The o?cial position of Facebook clears up that actions on this social network are focused 
on positive interactions, whilst negative must be expressed by comments of other user. 
Moreover, if post contains a link to other resource, accompanied by a short comment, 
there’s no way to determine – weather was liked the link itself, or user’s comment to it. 
So empirical studies show, that ‘like’ demonstrates just interest, acknowledgment 
or just support, and that resource worth enough to attract attention of other people, but 
do not enough to preserve it on personal timeline. 
Hence, “share” can be described as so important event or text for user, so he decided 
to preserve it. But there are also such issues, as in case of “like”. We cannot determine, 
what exactly is important, the shared resource itself, or comment to it. We even can’t 
determine the exact number of shares, as the post can be directly copied into a user’s 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 359–368, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_28
page with or without reference to its author. And there are a certain social network 
aggregators – special sites, which gather news from SN’s and reprint them. 
This means that we have very little raw data to estimate e?ciency of Facebook 
account. It’s obvious, that average quantity of likes and shares can show level of e?- 
ciency (and most network services act just that way), but to make a proper estimation, 
we have to know more. 
For example, Klout service tries to measure in?uence, which has certain user of 
social network. It gains access to user’s account and tracks its activity in terms of impact 
of every activity, then summarizes them and makes ?nal estimate in Klout points from 
0 to 100. The algorithms of Klout are closed and heavily secured by patents, but the 
main parameters of its estimation are simple [2, 3]. 
• Quantity of followers (or subscribers – Klout uses the same approach for all main 
world-wide social networks, including Facebook and Twitter); 
• Quantity of likes and shares; 
• Quantity of users, engaged in conversation (e.g. number of users, who leave at least 
one comment); 
• Interacting with another user with more high score. 
According to Klout, the highest score in 2014 have Barack Obama, Beyonce, 
Britney Spears, later in 2018 Barack Obama, Justin Bieber, zooey deschanel, and…, 
surprisingly, The Beatles – e.g. accounts, which very often publish newsworthy 
information. This can be compared with Nielsen rating for TV shows – the more 
people watch it, the higher is the estimate [4]. There are even studies that show 
dependence of Klout score from logarithmic function of the quantity of subscribers 
and enlisted in conversations [3]. 
So, Klout estimate can show popularity, but it doesn’t show e?ciency. Moreover, it 
doesn’t work on people, who create original yet highly specialized content and have 
their devoted audience [3]. Let’s pick up as example Drew Karpyshyn, one of bestselling 
writers for Star Wars (he is also a script writer for award-winning game Mass E?ect). 
Will be there a surprise, that his Klout rating is only 53? Dan Abnett has such as low 
rating of 54 points and even Umberto Eco has 55 points. These people are not unpopular, 
just the opposite, but their popularity do not rely on frequent activity and being in-touch 
with main global events or memes. 
So, the precious question is not in estimate itself. We need a method, which can 
measure relative popularity and e?ciency – e.g. not in global context of social network, 
but in context of their potential and devoted audience. This method will show more real 
popularity then estimates, based on frequent updates. To reach this goal we need a proper 
measurement of that audience, what is the main subject of this article. 
2 Related Works 
In recent years a number of studies in social networks were performed. The most of 
them are on Twitter platform. One of the very complex works was conducted by Kwak, 
Lee, Park and Moon [5]. They surveyed more than 4000s of trending topics and about 
360 M. Al-Maitah
106 millions of tweets. Complexity of this analysis is possible due to small size of 
“tweet” – short message or even just a hashtag (short slogan used for trending topics in 
Twitter). 
Questions of influence in social network was covered by the theoretical works 
by [6–9], who suggested that social network can be described as graph relation- 
ships; hence the influence can be modeled as threshold and cascade approximation. 
Kempe also proposed a set of mathematical approaches for maximizing influence 
within social networks using marketing strategies. 
Newman, Watts et al. [10] suggest that analysis also can be conducted using random 
graphs with certain degree of distributions. Such model allows describing not only social 
network as a whole, but also a subnet works, like groups, communities and others. 
On the contrary [11] consider social network as a net of directed links, which can be 
marked, propagated and mentioned. The di?erence in?uence in the terms of marking 
(likes, etc.) and in?uence in terms of propagation; perhaps, they were the ?rst, who 
pointed out, that high level of in degree not necessarily means real in?uence to other 
users. 
But all these surveys were conducted on Twitter, due, as was mentioned before, its 
short-messaging nature. Facebook is still much less attractive platform for conducting 
statistical and estimation studies and relevant studies on its content are rare to found. So 
we use as a base mainly Twitter-based works. 
3 Data Extraction and Analysis 
Measuring of audience can be performed only on very speci?c group (or segment) within 
social network. We developed a set of requirements for such group, which include: (a) 
group must be large enough; (b) there must be at least three opinion leaders within it; 
(c) group must have high update rate; (d) updates must contain original content or orig- 
inal comments to ensure, that audience have minimized in?uence from outside. 
Hence, as experimental space we had selected Ukrainian segment of Facebook. 
Here’s a checklist according our criteria: 
Large enough network segment. According to [12] has 2 million 143 thousands 140 
users. It’s about 4.72% of total country population (SocialBakers Facebook Statistics 
Ukraine, 2017). 
Opinion leaders: In Ukraine, at least 20 in?uential opinion leaders exists, who reside 
primarily in Facebook (e.g. their original content appears there earlier then in national 
media) [13]. Moreover, Proceedings of the ECSM-2014 outline, that in Ukraine about 
40% (to be precise – from 49% to 38% – depending on internal situation) of population 
describes Facebook as the primary source of important events [14]. 
High update rate: Our observations show, that in Ukrainian political and social life 
emerges at least three main events (on hybrid war, on political process, on everyday life) 
and about ten events of a smaller value. So daily update rate of an average Facebook 
account with certain number of readers is about 3 or 5 updates in a day – which diverse 
from large posts to one-liners. 
A Study of Measurement of Audience in Social Networks 361
Original content: ECSM-2014 also show that content in Ukrainian segment of 
Facebook more often contains original information and opinions then traditional 
media [13, 14]. 
So we picked up eight in?uential accounts, which already have their devoted audi- 
ence, have certain number of readers (more than 10000), and certain position in Ukrai- 
nian society, and observed them through one month, October of 2017. This period also 
was a last month of electoral rally, so active Facebook audience was maximized and 
measurement quite accurate. To preserve privacy, we identify observed accounts only 
by initials and concentrated their characteristics in Table 1. 
Table 1. Base characteristics of observed counts 
User Updates per month Subscribers 
O. T. 11 34761 
A. Y. 46 290344 
A. A. 28 244785 
A. G. 94 75070 
H. H. 134 18268 
Y. S. 24 43037 
P. P. 94 264982 
Y. T. 50 76600 
Fig. 1. Account update performance (likes). 
This performance can be measured using certain indicators. Pay attention to Fig. 1 
which shows detailed performance of one account (namely, H. H.) because it has very 
large number of updates. 
Selected accounts performed throughout October, as showed in Table 2. 
362 M. Al-Maitah
Table 2. Account raw performance. 
User Average likes Average shares Updates with higher 
like rate 
Min likes Max likes 
O. T. 132 21 3 15 302 
A. Y. 4143 350 21 26 9972 
A. A. 4288 500 9 482 21989 
A. G. 1222 157 40 97 4341 
H. H. 255 24 40 12 2306 
Y. S. 1129 150 7 60 5964 
P. P. 2663 207 37 750 8436 
Y. T. 406 32 18 96 943 
We can see in this ?gure that performance of di?erent posts diverse from very low 
to very high. Such wide diversity allows us to split general audience into three main 
categories: 
• Supporters (or devoted audience): Their number is described by minimal like rate. 
This is also lowest level of interested audience of certain account. Such people tend 
to like every post of befriended or tracked account just to support it, even sad or bad 
news, which cannot be positively marked. 
• Regular audience: Their number is described by average like rate. This is number of 
guaranteed readers, on which Facebook user can count when posting new update. 
• Potential audience: Their number is described by maximal number of likes. It is the 
current potential which account can handle if proper information policy is conducted. 
Similarly, we can build a chart for shares, which is displayed in Fig. 2. This indicator 
demonstrates rather not the audience, but the sensitivity level of account owner (e.g. 
how his updates correspond with feelings and views of his subscribers). Hence, we have 
the following categories of topics, depending of their share rate. 
Fig. 2. Accounts update performance (shares). 
Notes of zero importance: Such updates have zero shares. Mostly, its everyday notes, 
which contains information, useful only for account owner; Notes for limited audience. 
This type of topics have mostly “friends-only” visibility type and intended for sharing 
A Study of Measurement of Audience in Social Networks 363
only among close friends, partners, those, who has similar interests. They include 
questions, requests and so on. Share rate for such topics is below average. 
Main topics: This is updates with average (with a certain spread of values) share rate 
and contains the main topics, which attract people to this account. Typically it’s an 
opinion on speci?c interest – e.g. economics, politics, games, music, etc. – what can be 
described as serious hobby or professional activity of account owner; socially, important 
topics (or Hit Topics). This category contains hits of shares. The higher is the rate – the 
more important topic, which update is dedicated to. Hits are very rare (see the chart) and 
often have very high share rate comparing to most updates on other topics, but it is not 
necessary. 
It is possible to empirically point out, that Klout rating highly depends of hits. If 
account has small number of hits, it will have low Klout rate, as well as other statistically 
based popularity estimates. 
Using this raw data, we can build at least two indicators, which can be used to 
measure audience of certain account. 
Active and passive audience: Active audience A1 is calculated as ratio between 
average number of likes and total number of readers. Passive audience 
A' 1 
, respectively, 
is the supplementary value, which can be obtained just as di?erence between 100 percent 
and value of A1 (see Fig. 3). 
A1 
= 
Navg.like 
Nreaders 
·1 100% 
(1) 
A' 1 
= 
100 
-0 
A1 (2) 
Fig. 3. Active audience percentage (blue bars) and social importance (red bars). 
Social importance of account is the ratio between average number of shares and total 
number of readers, just similar to the previous indicator. 
A2 
= 
Navg.shares 
Nreaders 
·2 100% 
(3) 
Social importance cannot be high for personal accounts – otherwise it’s not a 
personal account, but a global or local media, which is primary source for very large 
364 M. Al-Maitah
numbers of other accounts. This indicator can be indeed used for determining whether 
account belongs to a real person, or a media frontend. If importance is more than 0.5%, 
it’s very good for a person, and importance higher than 10% is a mark of media. 
Have such two base indicators, we can proceed to derived indicators. 
For example, sensitivity level of account can be measured as a ratio between average 
like rate and number of updates per month. 
E1 
= 
Navg.like 
Nupd/month 
·1 100% 
(4) 
This indicator can be used to determine, how account owner main topics are valued 
by his/her audience. Moreover, monthly change of sensitivity level can be used to eval- 
uate growth or degrade of account authority within its regular audience. This indicator 
does not depend on hit topics; hence it will be more precise, then other statistical ratings. 
The next indicator is calculated as ratio between minimal and maximal rates of like. 
This indicator shows audience coverage. 
E2 
= 
Nmin.likes 
Nmax.likes 
·2 100% 
(5) 
Using this indicator and its monthly change, it is also possible to measure growth of 
popularity of certain account. Likewise, have careful study of hit topics along with 
monthly change of audience coverage, we can evaluate, how account owner views 
correspond with views and interests of his/her subscribers. 
And ?nally, using ratio between average number of shares and average like rate we 
can determine relevance. It’s obvious that importance for account’s audience updates 
will be not only “liked” but also “shared”, so the more percentage of such updates, the 
more will be the value of this indicator. 
E3 
= 
Navg.shares 
Navg.likes 
·3 100% 
(6) 
Similar to social importance, this indicator can also be used to determine whether 
account is a media. For personal account it shows grade of opinion leadership. The 
persons with most value of sensitivity are opinion leaders for this group. 
Just in case let’s calculate these indicators for our test subjects (see Fig. 4). 
Of course, this study is just an approach to estimation method, but even in such short 
form it can be used for analysis in social networks. 
It can even solve the problem of “invisible audience”. In social networks, audience 
largely remains invisible to users and can be estimated only indirectly – via feedback. 
But the latter is unstable and varies day to day, because users can simply log out, haven’t 
seen the precious post and so on. For big media products audience can be estimated via 
surveys and web analytics, but for the individuals such things are unreachable, so they 
not see their audience. But that “invisible audience” is critical for them and our method 
can quantify it and help to improve their media activity. 
A Study of Measurement of Audience in Social Networks 365
Fig. 4. Sensitivity (blue bars), audience coverage (red bars) and relevance (green bars). 
4 Conclusion and Future Works 
This article covers experiment conducted only for one month, from raw data to certain 
degree of generalization, recapping as a set of indicators and formulas. Given the high 
rate of events in selected social network segment, this survey is just an outline, merely 
an approach to a more complicated and more general method of estimation. 
For example, we do not include to our survey number of comments due to two main 
reasons: (a) we simply do not have a method to determine, whether comment is auto- 
mated or belong to real person and represent real opinion; (b) we do not have appropriate 
method for estimation of comment value (Facebook allows only to like comment). 
The question of fake accounts and automated comments is open and highly disput- 
able. Facebook itself estimates, that on its platform exists from 5.5% to 11.2% fake [15]. 
There’s also certain web services to estimate quantity of fakes among friends of given 
account, based on certain criteria [16, 17]. Such tools provide SocialBakers [12], there 
are also methods to distinguish them from real pro?les [18] and others. But they allow 
estimating only general quantity of fakes, but not the nature of certain comment and its 
author. So, there’s a need for a detailed study on comments, which is one of our main 
goals in future work. 
The next our goal is to create integral rating estimate for account, which can provide 
alternate to Klout and other frequency-dependent statistical tools. We intend to make 
close survey of selected accounts for more long periods and determine not only base 
indicators but also dynamics of their change. 
And the third direction of our future work is surveying trending topics in Facebook, 
its origins, ?ow and process of propagation, along with analysis of interest spaces, 
related to them. 
Such complex studies will be useful not only for exploration of information ?ow in 
social network, but also will help people to improve their popularity and promote their 
original content without necessity of frequent updates and dependencies of global news 
tra?c. 
366 M. Al-Maitah
References 
1. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior in online 
social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet 
Measurement Conference, New York, USA, pp. 49–62 (2009) 
2. Golliher, S.: How I reverse engineered klout score. Online journal by Sean Golliher. http:// 
www.seangolliher.com/2011/uncategorized/how-i-reversed-engineered-klout-score-to-an-r2-
094/ 
3. Stevenson, S.: What your klout score really means wired. http://www.wired.com/2012/04/ 
?_klout/all/. Accessed Apr 2012 
4. Drula, G.: Social and online media research—data, metrics and methods. Rev. Appl. Socio 
Econ. Res. 3, 77–86 (2012) 
5. Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a 
news media. In: Proceedings of the 19th International Conference on World Wide Web, New 
York, USA, pp. 591–600 (2010) 
6. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of in?uence through a social 
network. Theory Comput. Open Access J. 11, 105–147 (2015) 
7. Ruixu, G.: Research on information spreading model of social network. In: Second 
International Conference on Instrumentation and Measurement, Computer, Communication 
and Control, Beijing, China, pp. 918–920 (2012) 
8. Tang, J.: Computational models for social network analysis. A brief survey. In: Proceedings 
of the 26th International Conference on World Wide Web Companion, Perth, Australia, pp. 
921–925 (2017) 
9. Jingbo, M., Lourdes, M., Amanda, H., Minwoong, C., Je?, C.: Research on social networking 
sites and social support from 2004 to 2015: a narrative review and directions for future 
research. Cyberpsychol. Behav. Soc. Netw. 20(1), 44–51 (2017) 
10. Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc. 
Nat. Acad. Sci. U.S.A. 99 (2002) 
11. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P.: Measuring user in?uence in Twitter: 
the million follower fallacy. In: Proceedings of the 4th International AAAI Conference on 
Weblogs and Social Media (ICWSM), ICWSM 2010 on Weblogs and Social, Washington 
DC, USA, pp. 10–17 (2010) 
12. SocialBakers Facebook Statistics (Ukraine). http://www.socialbakers.com/statistics/ 
facebook/pages/total/ukraine/ 
13. Jaitne, M., Kantola, H.: Countering threats: a comprehensive model for utilization of social 
media for security and law enforcement authorities. In: Proceedings of the 13th European 
Conference on Cyberwarfare and Security, Greece, pp. 102–109 (2014) 
14. Ronzhyn, A.: The use of Facebook and Twitter during the 2013–2014 protests in Ukraine. 
In: Proceedings of the European Conference on Social Media, University of Brighton, UK, 
pp. 442–448 (2014) 
15. Facebook Estimates from 5.5 to 11.2 accounts are fake. The Next Web. http:// 
thenextweb.com/facebook/2014/02/03/facebookestimates-5-5-11-2-accounts-fake/ 
16. Veerasamy, N., Labuschagne, W.: Determining trust factors of social networking sites. In: 
Proceedings of 12th European Conference on Information Warfare and Security, Finland, pp. 
288–297 (2013) 
A Study of Measurement of Audience in Social Networks 367
17. Sirivianos, M, Cao, Q., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large 
scale social online services. In: Proceedings of the 9th USENIX Conference on Networked 
Systems Design and Implementation, USENIX Association Berkeley, CA, USA, pp. 15–15 
(2012) 
18. Cook, D.: Identity multipliers and the mistaken Twittering of birds of feather. In: Proceedings 
of the 13th European Conference on Cyberwarfare and Security, Greece, pp. 42–48 (2014) 
368 M. Al-Maitah
Predicting Disease Outbreaks Using 
Social Media: Finding Trustworthy Users 
Razieh Nokhbeh Zaeem(B) , David Liau, and K. Suzanne Barber 
Center for Identity, The University of Texas at Austin, Austin, USA 
{razieh,sbarber}@identity.utexas.edu, davidliau@utexas.edu 
Abstract. The use of Internet data sources, in particular social media, 
for biosurveillance has gained attention and credibility in recent years. 
Finding related and reliable posts on social media is key to performing 
successful biosurveillance utilizing social media data. While researchers 
have implemented various approaches to ?lter and rank social media 
posts, the fact that these posts are inherently related by the credibility 
of the poster (i.e., social media user) remains overlooked. We propose six 
trust ?lters to ?lter and rank trustworthy social media users, as opposed 
to concentrating on isolated posts. We present a novel biosurveillance 
application that gathers social media data related to a bio-event, pro-cesses 
the data to ?nd the most trustworthy users and hence their trust-worthy 
posts, and feeds these posts to other biosurveillance applications, 
including our own. We further present preliminary experiments to eval-uate 
the e?ectiveness of the proposed ?lters and discuss future improve-ments. 
Our work paves the way for collecting more reliable social media 
data to improve biosurveillance applications. 
Keywords: Biosurveillance 
·
Social media 
·
Twitter 
·
Trust 
1 Introduction 
Thanks to the ever-growing use of social media, the Internet is now a rich source 
of opinions, narratives, and information, expressed by millions of users in the 
form of unstructured text. These users report, among many other things, their 
encounters with diseases and epidemics. Internet biosurveillance utilizes the data 
sources found on the Internet (such as news and social media) to improve detec-tion, 
situational awareness, and forecasting of epidemiological events. In fact, 
since mid 1990’s, researches have used Internet biosurveillance techniques to 
predict a wide range of events, from in?uenza [5] to earthquakes [9]. Internet 
biosurveillance takes advantage of what is called hivemind on social media—the 
collective intelligence of the Internet users. 
The sources of Internet biosurveillance (e.g., social media) are, generally, 
timely, comprehensive, and available [10]. These sources, however, are enormous 
and noisy. An important pre-processing step to draw meaningful results from 
these sources is to ?lter and rank the most related parts of the data sources. 
.h
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 369–384, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_29
370 R. N. Zaeem et al. 
Such ?ltering and ranking is widely recognized in the literature. For instance, in 
their overview of Internet biosurveillance [10], Hartley et al. break the process 
of Internet biosurveillance into four steps: (1) the collection of data from the 
Internet, (2) the processing of the data into information; (3) the assembling 
of that information into analysis; and (4) the propagation of the analysis to 
biosurveillance experts. They identify relevancy ranking as one of the important 
sub-steps of processing data into information in step two, before the actual 
analysis begins in step three. 
In order to ?lter and rank the posts (i.e., Twitter posts or news articles), 
researchers have implemented various approaches, like Machine Learning (e.g. 
Naive Bayes and Support Vector Machines [6,19]), and Natural Language Pro-cessing 
(e.g., Keyword and Semantic-based Filtering [8] and Latent Dirichlet 
Allocation [7]). All the previous e?orts, however, have focused on ranking the 
posts independently [12], ignoring the fact that these posts (Twitter posts or 
news articles) are inherently related by the virtue of the credibility of the poster 
(the Twitter user or news agency). 
Furthermore, users of social media can post about anything they wish to talk 
about. Some users talk about their illnesses online and these are the users we 
wish to monitor as they give us a sampling of the union’s infectious disease state. 
However, users can talk about being ill to illicit sympathy from other users, or 
they can just be faking it. It is important to evaluate the trustworthiness of users 
before extracting data for analysis. 
Unlike previous work, we observe the fact that the credibility of the users 
with respect to a given epidemiological event should be taken into account when 
?ltering and ranking related posts. We propose six trust ?lters that ?lter and 
rank social media users who post about epidemiological events: Expertise, Expe-rience, 
Authority, Reputation, Identity and Proximity. These trust ?lters obtain 
the credibility or trustworthiness of a user by considering the structure of the 
social network (e.g., the number of Twitter followers), the user’s history of posts, 
the user’s geo-location, and his/her most recent post. 
While we focus on the relevancy ranking sub-step by measuring the user 
trustworthiness, we introduce a comprehensive framework that performs the 
entire cycle of Internet biosurveillance as explained by the four steps mentioned 
by Hartley et al. [10]. We leave technical details of some of the steps out of this 
paper, and discuss them separately elsewhere. 
Finally, in a preliminary set of experiments, we collect the posts and geo-locations 
of 2,000 real Twitter users. We investigate the e?ectiveness of our pro-posed 
trust ?lters. We observe the statistics of the ?lter scores and correlations 
between the ?lters and suggest future improvements. 
2 Overview: Surety Bio-Event App 
The Surety Bio-Event App is our Internet biosurveillance application developed 
at the University of Texas at Austin for the DTRA Biosurveillance Ecosystem 
(BSVE) [18] framework. The BSVE provides capabilities allowing for disease
Predicting Disease Outbreaks Using Social Media 371 
Fig. 1. Overview of the Surety Bio-Event App. 
prediction and forecasting, similar to the functionality of weather forecasting. 
The BSVE is a virtual platform with a set of integrated tools and data ana-lytics 
which support real-time biosurveillance for early warning and course of 
action analysis. The BSVE provides a platform to access a large variety of social 
media data feeds, a software development kit to create applications (apps), var-ious 
tools, and the cloud service to host a web-based user interface. Developers 
develop BSVE apps and deploy them to the BSVE to be ultimately used by 
biosurveillance experts and analysts. 
Our Surety Bio-Event app covers the entire cycle of Internet biosurveil-lance 
according to previous work [10]. Figure 1 shows a high level picture of 
the Surety Bio-Event App. The four steps are: (1) Multi-Source Real-Time Data 
which collects data (Sect. 5), (2) Trust Filter which processes data into infor-mation 
(Sect. 3), (3) Surveillance Optimization (including early detection, situ-ational 
awareness and prediction) which assembles the information into analysis 
(Sect. 6), and (4) Forecasts and Predictions which propagates the analysis to 
experts through a Graphical User Interface (Sect. 4). Furthermore, the Surety 
app is user customizable and receives Goals and Situational Awareness as well 
as Historical Data, Detections, and Predictions from biosurveillance experts. 
Figure 2 shows a more detailed view of the App. In this paper, we concen-trate 
on the second step, the trust ?lter, while we broadly review the other steps 
too. With data collected from social media, the trust ?lter component of the 
App evaluates the data sources to ?nd the most trustworthy social media users 
with respect to a given surveillance goal. The trust ?lter component optimizes 
range, availability and quality of data using the combination of algorithms mea-suring 
six dimensions of trust: Expertise, Experience, Authority, Reputation, 
Identity and Proximity. The primary functions of the trust ?lter component are: 
(1) improving the quality of data employed by BSVE applications and analysts
372 R. N. Zaeem et al. 
Fig. 2. Diagram of data collection and analysis with the Surety Bio-Event App (SBEA). 
to make biosurveillance decisions, (2) tracking and quantifying trustworthiness 
of known, preferred users to guard against data bias and quality drift for BSVE 
applications and analysts, and (3) expanding the landscape of possible trusted 
social media users by o?ering trusted but previously unexplored users via rec-ommendation 
noti?cations to BSVE applications and analysts. 
3 Trust Filters 
In order to determine user trustworthiness, we introduce the concept of a trust 
?lter—a score between 0 and 1 assigned to a user (e.g., a Twitter user) which 
rates his/her trustworthiness with respect to a given criteria. We propose six 
trust ?lters: 
Expertise. Expertise measures a user’s involvement in the subject of inter-est 
[3]. We de?ne Expertise as the probability that a user will generate content 
on the topic in question (e.g., an In?uenza outbreak). Using the user’s history of 
posts, Expertise can be calculated as how often a speci?c user has written about 
the subject of interest in the past. 
Expertise(ui,t) = p(t|ui) = #Posts(ui,t)/#Posts(ui), where ui is a user in 
the social media network, t is a topic, and p(t|ui) is the probability that a user 
has generated content on that topic. We calculate this probability by counting 
the number of that user’s posts on the topic and dividing by his/her total number 
of posts. For all the ?lters, we use a keyword based classi?er to distinguish the 
posts concerting the topic of interest and the users posting about that topic. 
Experience. Experience is the degree to which a user’s posts are corroborated 
by other users. Informally, Experience seeks to measure how a user’s posts about 
a subject are corroborated by the ground truth. Assuming that the average
Predicting Disease Outbreaks Using Social Media 373 
involvement of all users in the subject of interest reveals the truth about the 
outside world (e.g., everybody posts about ?u when a ?u outbreak actually 
happens), we can use this average to calculate Experience. In order to do so, 
we measure the di?erence between a user’s involvement in the subject using 
Expertise and the average Expertise. To get a score that is between 0 and 1, and 
using the fact that Expertise is already between 0 and 1, we calculate Experience 
as Experience(ui,t) = 1 
-) |Expertise(t) 
-)
Expertise(ui,t)|. 
The closer one’s Expertise to the average Expertise, the higher his/her Expe-rience 
score. 
Authority. Authority is the number and quality of social media links a user 
receives from Hubs as an Authority [3]. A link is the relationship between users, 
e.g., likes and comments on Facebook, and following on Twitter. We utilize the 
Hyperlink-Induced Topic Search (HITS) [11] algorithm, a link analysis algorithm 
widely used to rank Web pages and other entities that are connected by links, to 
get a score between 0 and 1. In this algorithm, certain users, known as Hubs, serve 
as trustworthy pointers to many other users, known as Authorities. Therefore, 
Authorities are the users that have been recognized within the social media 
community. 
Reputation. Reputation is the number and quality of social media links to 
a user. We utilize the PageRank algorithm [2], another widely used ranking 
algorithm, to get a score between 0 and 1. 
Identity. Identity is the degree of familial or social closeness between a user and 
the person a?icted with the disease. The Identity ?lter is de?ned as the rela-tionship 
between the posting user that talks about the disease and the subject of 
the post that has somehow encountered the disease. If the user is reporting the 
disease about himself/herself, the Identity score assigned would be the maximum 
value, which is 1. If the user reports about a closer family member, the score 
would be higher compared to when the user reports about an acquaintance of 
his/hers. We utilize Natural Language Processing and Greedy algorithms to cal-culate 
this score. This trust ?lter ?rst ?nds all possible grammatical subjects of 
a sentence (e.g., a Twitter post), then using the words in the family tree, it ?nds 
the closest family relationship to those subjects and reports that family relation-ship 
(e.g., self, mother, co-worker, son) for Identity. A score is assigned to this 
relationship ranging from 1 (i.e., reporting disease about self) to 0 (i.e., talking 
about total strangers). In order to get the Identity score of a user, the Identity 
of all of his posts about the subject of interest are calculated and averaged. More 
details on this ?lter can be found in our previous work [13]. 
Proximity. Proximity estimates the distance of a user from the event (e.g., 
disease outbreak location). Using relationship distance (i.e., Identity score) and 
geographical distance (through geo-tagged posts and the geo-location of the
374 R. N. Zaeem et al. 
user), Proximity utilizes a greedy algorithm to perform graph traversal over the 
social media network and then combines the Identity value with the distance 
value to calculate the Proximity as shown in Algorithm 1. 
Algorithm 1. Proximity Algorithm 
Input : Directed user graph G 
Output: Proximity scores user.proximity 
1 Initialize Identity threshold: T ; 
2 for user in users do 
3 if user.identity > T then 
4 user.separation = 1/user.identity; 
5 else 
6 user.separation = 8; 
7 end 
8 end 
9 for user u in G do 
10 for user v in G 
-n {u} 
do 
11 distance = v 
?1 
u; 
12 u.separation = min(u.separation, v.separation 
× 
distance); 
13 end 
14 end 
15 for user in users do 
16 user.proximity = 1 
-6 
user.separation; 
17 end 
Note that, the network graph that the trust ?lters use is pruned so that it 
contains only those users that have posted (at least once) about the subject of 
interest. As a result, trust ?lter scores are calculated focusing on the community 
that discusses a particular subject on social media. 
4 Trust Filter GUI 
Figure 3 displays the Graphical User Interface (GUI) of the trust ?lter tab of 
the Surety app. The GUI is composed of four smaller windows. On the top left, 
the social media users are listed, and for each, the value of each of the six trust 
?lters is shown. Next to the gear icons, the names of the six trust ?lters appear: 
Identity, Reputation, Experience, Expertise, Authority, and Proximity. The last 
column is the Combined trust score, currently the average of the six ?lters. 
On the GUI, the analyst or BSVE app developer selects a trust ?lter. He/she 
can then sort the users with respect to that score (descending or ascending). 
The higher the score, the more trustworthy the user with respect to that trust 
?lter. In Fig. 3, the users are sorted based on Proximity in descending order. 
The analyst or BSVE app developer can also select favorite users that overtime 
he/she has found trustworthy and mark them with a star. The GUI suggests
Predicting Disease Outbreaks Using Social Media 375 
Fig. 3. Trust ?lter GUI of the Surety Bio-Event App. 
social media users that have a higher combined score compared to the favorite 
users with a blue glow under the user name (trusted but previously unexplored 
users) as shown in the ?gure. The analyst can review the favorite users (bring 
all the favorites to the top) too. 
On the GUI, the Network Graph is the top right window, which displays 
the users on social media as nodes and their links (e.g., following on Twitter) 
and sizes. The analyst can select a trust ?lter to size the nodes in the Network 
Graph. In this ?gure, the node sizes are based on Identity. 
On the bottom left of the GUI, under Node Histogram, the GUI charts the 
trust ?lter scores of users with the top ?ve users for the selected ?lter. 
On the bottom right, under Trust Score Distribution, the GUI displays the 
range of user trustworthiness, based on each ?lter and the combined score. The 
distribution of user trust scores with tunable granularity (set to 0.1 in this ?gure) 
shows the number of social media users that have a given trust score.
376 R. N. Zaeem et al. 
5 Data Collection 
In this section and the next, we brie?y overview the ?rst and third steps of the 
biosurveillance process, namely data collection and optimization, for the sake of 
completeness. 
The Surety app (1) uses data already available on the BSVE and (2) collects 
data and uploads to the BSVE. The data sources monitored within the BSVE 
include well established and trusted data providers such as the Centers for Dis-ease 
Control (CDC) and the World Health Organization (WHO). Data from 
these sources show the analyst working with the BSVE the best possible mea-sure 
of the state of disease within the country. In addition, the BSVE collects 
data from news sources and Twitter. From what the BSVE already provides, 
Twitter does contain a treasure trove of information. However, other sources 
such as blogs, Instagram, and Reddit have been under used. The Surety app 
aims to ?x these gaps in data collection. The trust ?lter part of the Surety 
App seeks to collect data from other sources not currently supported by the 
BSVE that contain connectivity network information, and are typically focused 
on individuals as opposed to news feeds. 
Figure 4 demonstrates some of the data sources for the Surety app. Note that 
not all the data sources are candidates to be used with trust ?lters. Some of these 
data sources provide only time series data which is used by the optimization part. 
The data sources that are appropriate for trust ?lters are as follows. For these 
sources, we have implemented methods within our API to collect historical user 
data as well as connections to streaming APIs: Twitter, WordPress, Instagram, 
Tumbler, Reddit, and Wikipedia. 
6 Optimization 
The third step of the biosurveillance process analyzes large collections of trusted 
data sources to assemble systems that e?ciently achieve user speci?ed surveil-lance 
goals, such as early outbreak detection. This analysis is accomplished 
through optimization algorithms that evaluate data collections through com-parison 
to historical and simulated bio-events. The Surety app yields trusted 
data sources, along with statistical models and performance metrics to support 
future surveillance activities. The trust ?lter part of the Surety App is capable 
of collecting a wide-range of data then formatting that data into the required 
time series data source for the optimization part. Our optimization algorithms, 
discussed elsewhere, include early detection, situational awareness and predic-tion 
[14]. 
7 Implementation 
Our app is implemented with a Python Flask back-end and JavaScript front-end. 
The back-end was developed to allow for user interactivity to the front end. 
It serves JSON data generated from the algorithms to the user interface. The 
application is integrated into the BSVE.
Predicting Disease Outbreaks Using Social Media 377 
Fig. 4. Data collection sources of the Surety App. 
8 Experiments 
We have designed a preliminary set of experiments to answer the following 
research question: How well do the proposed ?lters perform? In order to answer 
this question, we plan to use seed data (e.g., a synthetic network of users, posts, 
and disease outbreaks) as well as actual data (e.g., actual network of Twitter 
users and their posts). 
1. We observe the value of the trust ?lters and their trends. 
2. We compare ?lter scores against hospital data to judge the ability of the trust 
?lters to detect disease outbreaks. 
In this paper, we observe the trend of the proposed trust ?lters for a real network 
of 2,000 Twitter users with their posts. The use of seed data as well as the 
comparison with hospital data is work under progress. 
For this set of experiments, we downloaded the posts and geo-location of 2,000 
Twitter users. In order to do so, we performed a keyword search of the word ‘?u’ 
on Twitter API and then downloaded the user pro?le information (including geo-location 
coordinates), the user’s friends’ time-lines, lists of friends and followers, 
and past 30 days of tweets. We started the download on July 22, 2016 and, 
because of Twitter’s bandwidth limitations, it took us a week to download 2,000
378 R. N. Zaeem et al. 
users that have posted at least once with the word ‘?u’, totaling 33 GB. Note 
that not all the posts of these users over past 30 days are necessarily about ?u. 
We use a keyword based classi?er to distinguish ?u-related posts. 
Figure 5 shows the ?lters’ maximum, minimum and average values. The Iden-tity 
trust ?lter has an average (as well as peak) value at about 0.48, which means 
that, when people do post about ?u, they tend to post about ?u encounters of 
their nuclear family members, as 0.5 is assigned to nuclear family members for 
the Identity score. Reputation and Authority scores are unanimously close to 0, 
implying that the network we downloaded had very little connectivity. The low 
degree of connectivity is expected since people who post about ?u do not neces-sarily 
tend to follow others who post about ?u. The average value of Expertise 
was close to 0 too, meaning that even among those who have posted about ?u 
at least once, the number of ?u related posts over a 30 day period was rela-tively 
very low. The average value of 0.95 for Experience shows that most users’ 
Expertise score was close to the average Expertise, i.e. close to 0. Investigating 
the out-liners should point to users that were unusually concerned about ?u. 
Finally, we found that Proximity should be re-de?ned to make it independent of 
Identity, to show concrete distance from outbreak locations. 
Fig. 5. Statistics of trust ?lters. 
Figures 6, 7, 8, and 9 display the most interesting correlations we found 
between the ?lter values. Figure 6 shows that the combined score is most heavily 
in?uenced by Identity; these two ?lters are related with R2 equals to 0.49. There-fore, 
we might need to normalize and weigh the ?lters to get a new less-biased 
de?nition of the Combined score. 
Figure 7 charts the correlation between Reputation and Authority ?lters 
(R2 = 0.15). These two ?lters are not closely related. Therefore, while both 
measure the connectivity of the network, they consider di?erent aspects of 
connectivity.
Predicting Disease Outbreaks Using Social Media 379 
Fig. 6. Correlation between Combined Filter and Identity. 
Fig. 7. Correlation between Reputation and Authority. 
Figure 8 con?rms that Experience and Expertise are inversely correlated. We 
might need to update the de?nition of Experience to measure the corroboration 
by others di?erently.
380 R. N. Zaeem et al. 
Fig. 8. Correlation between Expertise and Experience. 
Finally, while Proximity is initialized with Identity, as Fig. 9 shows, it is rather 
independent of Identity. While the Proximity of users to a potential outbreak 
location can be compared to one another, the absolute value of Proximity still 
does not show the concrete physical distance between the user and a ?u outbreak 
location. 
8.1 Feature Importance 
We compare our trust ?lters with other simple features which are widely studied 
in processing Twitter data [16]. Figure 10 and Table 1 show the feature impor-tance 
score from the Scikit-Learn kit [17]. We use the Extremely Randomized 
Tree Classi?er as our method to evaluate the importance of each feature. We 
utilize a library [1,15] in which the Gini coe?cient is used as a measure to the 
importance of each feature. In short, the total importance scores sum up to one 
and the larger the score is, the more important in decision that feature is. As 
Table 1 shows, the best feature from Extremely Random Tree Classi?er is the 
number of posts by a speci?c user within the given period of time. Consequently, 
the ?lters that are based on the number of related posts, such as Experience and 
Expertise, work well. However, the number of posts can be easily forged with 
posting robots or Spam posts. Two other features that are known to perform
Predicting Disease Outbreaks Using Social Media 381 
Fig. 9. Correlation between Identity and Proximity. 
Fig. 10. Feature importance.
382 R. N. Zaeem et al. 
well in similar types of problems are the average post length and the number 
of tagged Twitter IDs which start with the symbol “@” [4]. Therefore, poten-tial 
?lters to consider can be based on these features. Identity, Reputation, and 
Proximity all perform better than the other features studied in previous work, 
including retweet, and whether or not the posts contain ‘?’ and ‘!’. Finally, 
Authority performs poorly and can be considered irrelevant. 
Table 1. Features and corresponding importance scores. 
Feature Importance score 
Number of posts 0.205 
Experience 0.143 
Expertise 0.132 
Avg. post length 0.129 
Number of @ tags 0.111 
Identity 0.100 
Reputation 0.099 
Proximity 0.033 
Retweet 0.029 
Contains ‘?’ 0.010 
Contains ‘!’ 0.009 
Authority 0.002 
9 Conclusion 
Filtering and ranking social media posts is essential to biosurveillance applica-tions 
that monitor them to detect and forecast disease outbreaks. We introduced 
a novel way to ?lter and rank social media posts by concentrating on the trust-worthiness 
of social media users with respect to a given subject. We proposed six 
trust ?lters and used them in the context of a complete biosurveillance applica-tion. 
We further evaluated these trust ?lters by observing how they perform on a 
real set of Twitter posts downloaded from 2,000 users for over 30 days. Improv-ing 
the ?lter de?nitions and judging the e?ectiveness of the ?lters in ?nding 
actual disease outbreaks are two major future work directions. 
Acknowledgment. Surety Bio-Event App is a long term project of the Center for 
Identity. The authors thank Guangyu Lin, Roger A. Maloney, Ethan Baer, Nolan 
Corcoran, Benjamin L. Cook, Neal Ormsbee, Haowei Sun, Zeynep Ertem, Kai Liu, and 
Lauren A. Meyers for their contribution to this project. This work has been funded 
by Defense Threat Reduction Agency (DTRA) under contract HDTRA1-14-C-0114 
CB10002.
Predicting Disease Outbreaks Using Social Media 383 
References 
1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classi?cation and Regres-sion 
Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont 
(1984) 
2. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. 
Comput. Netw. ISDN Syst. 30(1), 107–117 (1998) 
3. Budalakoti, S., Barber, K.S.: Authority vs a?nity: modeling user intent in expert 
?nding. In: 2010 IEEE Second International Conference on Social Computing 
(SocialCom), pp. 371–378. IEEE (2010) 
4. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: 
Proceedings of the 20th International Conference on World Wide Web, WWW 
2011, pp. 675–684. ACM, New York (2011) 
5. Collier, N., Son, N.T., Nguyen, N.M.: OMG U got ?u? Analysis of shared health 
messages for bio-surveillance. J. Biomed. Semant. 2(5), S9 (2011) 
6. Denecke, K., Krieck, M., Otrusina, L., Smrz, P., Dolog, P., Nejdl, W., Velasco, E.: 
How to exploit Twitter for public health monitoring. Methods Inf. Med. 52(4), 
326–39 (2013) 
7. Diaz-Aviles, E., Stewart, A., Velasco, E., Denecke, K., Nejdl, W.: Epidemic intelli-gence 
for the crowd, by the crowd. Int. AAAI Conf. Web Soc. Media 12, 439–442 
(2012) 
8. Doan, S., Ohno-Machado, L., Collier, N.: Enhancing Twitter data analysis with 
simple semantic ?ltering: example in tracking in?uenza-like illnesses. In: IEEE 
Second International Conference on Healthcare Informatics, Imaging and Systems 
Biology (HISB), pp. 62–71 (2012) 
9. Doan, S., Vo, B.-K.H., Collier, N.: An analysis of Twitter messages in the 2011 
Tohoku earthquake. In: International Conference on Electronic Healthcare, pp. 
58–66. Springer (2011) 
10. Hartley, D.M., Nelson, N.P., Arthur, R., Barboza, P., Collier, N., Lightfoot, N., 
Linge, J., Goot, E., Mawudeku, A., Mado?, L.: An overview of internet biosurveil-lance. 
Clin. Microbiol. Infect. 19(11), 1006–1013 (2013) 
11. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 
(JACM) 46(5), 604–632 (1999) 
12. Lamb, A., Paul, M.J., Dredze, M.: Separating fact from fear: tracking ?u infections 
on Twitter. In: HLT-NAACL, pp. 789–795 (2013) 
13. Lin, G., Nokhbeh Zaeem, R., Sun, H., Barber, K.S.: Trust ?lter for disease surveil-lance: 
Identity. In: IEEE Intelligent Systems Conference, pp. 1059–1066, September 
2017 
14. Liu, K., Srinivasan, R., Ertem, Z., Meyers, L.: Optimizing early detection of emerg-ing 
outbreaks. Poster presented at: Epidemics 6, Sitges, Spain, November 2017 
15. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable impor-tances 
in forests of randomized trees. In: Proceedings of the 26th International 
Conference on Neural Information Processing Systems, NIPS 2013, USA, vol. 1, 
pp. 431–439. Curran Associates Inc. (2013) 
16. ODonovan, J., Kang, B., Meyer, G., H¨ollerer, T., Adalii, S.: Credibility in context: 
an analysis of feature distributions in Twitter. In: 2012 International Conference 
on Privacy, Security, Risk and Trust and 2012 International Conference on Social 
Computing, pp. 293–301, September 2012
384 R. N. Zaeem et al. 
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., 
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., 
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine 
learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 
18. Digital Infuzion: DTRA Biosurveillance Ecosystem (BSVE) (2017) 
19. Torii, M., Yin, L., Nguyen, T., Mazumdar, C.T., Liu, H., Hartley, D.M., Nelson, 
N.P.: An exploratory study of a text classi?cation framework for internet-based 
surveillance of emerging epidemics. Int. J. Med. Inform. 80(1), 56–66 (2011)
Detecting Comments Showing Risk for Suicide 
in YouTube 
Jiahui Gao1 , Qijin Cheng2(&) , and Philip L. H. Yu1 
1 
Department of Statistics and Actuarial Science, The University of Hong Kong, 
Pok Fu Lam, Hong Kong 
2 
Department of Social Work, The Chinese University of Hong Kong, Shatin, 
Hong Kong 
qcheng@cuhk.edu.hk 
Abstract. Natural language processing (NLP) with Cantonese, a mixture of 
Traditional Chinese, borrowed characters to represent spoken terms, and Eng-lish, 
is largely under developed. To apply NLP to detect social media posts 
showing suicide risk, which is a rare event in regular population, is even more 
challenging. This paper tried different text mining methods to classify comments 
in Cantonese on YouTube whether they indicate suicidal risk. Based on word 
vector feature, classi?cation algorithms such as SVM, AdaBoost, Random 
Forest, and LSTM are employed to detect the comments’ risk level. To address 
the imbalance issue of the data, both re-sampling and focal loss methods are 
used. Based on improvement on both data and algorithm level, the LSTM 
algorithm can achieve more satis?ed testing classi?cation results (84.3% and 
84.5% g-mean, respectively). The study demonstrates the potential of auto-matically 
detected suicide risk in Cantonese social media posts. 
Keywords: SuicideText miningSocial mediaCantonese 
Sentiment analysis 
1 Introduction 
Suicide is a serious public health concern globally and Hong Kong is no exception. The 
latest suicide rate in Hong Kong is about 11.7 per 100,000 [1], which is about the 
medium level in the global context [2]. In addition, suicide is the leading cause of death 
among young people in Hong Kong [3]. Due to the popularity of social networking 
sites in recent years, many young people were found to disclose their emotional distress 
and even suicidal thoughts through social media [4]. Suicide prevention professionals 
are, therefore, highly concerned with those online contents and hope to detect online 
posts showing risk for suicide as early as possible so that interventions can be delivered 
and lives can be saved. 
Q. Cheng—Equal ?rst author. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 385–400, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_30
1.1 Related Work 
Some pioneering efforts have been conducted to detect textual content showing suicide 
risk. Some basic machine learning methods were used to classify suicide notes, 
achieving 71% accuracy [6]. However, the accumulation of suicide notes is restricted 
by very limited data sources and can be time consuming. Thanks to the instantaneity of 
social media content, detection of suicide ideation in social network can strengthen 
suicide prevention to a large extent. However, few work of suicide text detection in 
social media has been conducted. In 2007, blogs were ?rst used to detect users at risk. 
Yen-Pei Huang [7] applied simple counting methods based on suicide-related key-words 
to detect bloggers with suicide tendency [8], which only achieved 35% success 
rate with low accuracy. Based on simple token unigram bag-of-word features, machine 
learning algorithms were also used to predict the suicide tendency on Twitter [9]. 
Concerning users’ behavior feature in social network, M. Johnson Vioulès [10] applied 
a martingale framework for suicide warning signs detection. However, Vioulès’ study 
was run on only two Twitter users’ data. 
In Mainland China, researchers have tried different statistical and machine learning 
methods to detect Weibo (a Chinese social media site) posts showing emotional distress 
and suicide risk [11]. Although achieving promising results, they also noted a few 
challenges. First, dataset for detecting suicide risk is often highly imbalanced, given 
that suicidal behavior is a rare event. A number of solutions to the class-imbalance 
problem are proposed both at the data and algorithm levels [12]. At data level, 
researchers often had to conduct re-sampling to adjust the imbalance, such as random 
over-sampling of minority class with replacement, random under-sampling of majority 
class, direct over-sampling, direct under-sampling, and so on [13]. At algorithm level, 
adjustment of cost function in algorithm is suggested. In addition, those studies often 
retrospectively collected data from social media and used the historical data for training 
and testing. However, such solutions will make it questionable to directly apply the 
results in real life, where suicide is indeed a rare event and social media contents are 
constantly updating and evolving. 
Although both Mainland China and Hong Kong consist mainly of Chinese ethnics, 
Hong Kong people speak Cantonese dialect and often write in a mixture of Cantonese 
and English due to its history of being a British colony. Due to the absence of Can-tonese 
natural language processing tool, text feature extraction in Cantonese is often 
based on simple n-gram features rather than word features [14]. A study found that 
Cantonese pre-treated by a Mandarin word segmentation tool consistently outperforms 
the character n-gram split [15]. In order to classify the at-risk online text better, we need 
to do Cantonese word segmentation using a satisfactory method. 
The main contribution of this paper is fourfold. First, this might be the ?rst time 
that Cantonese social media texts’ word vector features are used for detecting suicide 
risk. We conducted Cantonese word segmentation based on a relatively complete 
Cantonese dictionary by combining dictionaries on the internet. Second, unlike pre-vious 
suicide detection that relied on retrospective accumulation, we investigated an 
algorithm to detect suicide risk based on comments’ text features immediately. Third, 
deep learning method was used to train the word vector model and achieved a better 
result than custom machine learning model. Lastly, we introduced the focal loss, in 
386 J. Gao et al.
addition to the re-sampling method, to tackle the imbalance issue in text ?eld and 
achieved a satisfactory result. Focal loss, a new loss function, is found to be an effective 
alternative for dealing with class imbalance [16]. 
1.2 Paper Outline 
In the next section, the construction of Cantonese resource base will be briefly intro-duced. 
Section 3 presents the methods we used to preprocess the suicide-related 
comments. Section 4 introduces the feature extraction and classi?cation methods. 
Evaluation metrics will also be introduced in this section. Section 5 analyzes the 
experiment result. In the last section, this paper is concluded and future works are 
discussed. 
2 Construction of Cantonese Resource Base 
Social media posts are openly available at large. However, to label which posts show 
risk for suicide requires annotations by suicide prevention professionals. Besides, even 
though the simple Chinese and English text mining is relatively mature, little work was 
done in the Cantonese text mining. The absence of popular Cantonese dictionary is also 
an obstacle in the ?eld. 
2.1 Data Collection and Annotation 
There has been a surge of student suicides in Hong Kong in recent years, which was 
prominently reported by local press and generated wide discussion among the public. 
One of the authors, QC, has been monitoring how people responded to this issue in 
social media. She identi?ed 162 YouTube videos relating to this issue published during 
the 2015/16 school year, to which there were 5051 comments posted in the public 
domain. The comments were downloaded by calling YouTube API and annotated by 
QC and a trained research assistant (RA). Those comments indicating that the com-menter 
was having or had serious suicidal thoughts, including having attempted sui-cide, 
were labelled as at-risk. Both QC and the research assistant have ?rst coded a 
random sample of 100 comments separately. The inter-rater reliability was examined 
by Cohen’s Kappa coef?cient as 0.91, which indicated high agreement. Then the RA 
completed the annotation of the rest of comments. 
2.2 Construction of Cantonese Corpus 
In fact, Cantonese is primarily a spoken language. The most important mechanism by 
which Cantonese is represented in written form is phonetic borrowing. Sometimes, 
when confronting the ‘sound but no character’ problem, Cantonese speakers resorted to 
the strategy of creating a new character to represent a Cantonese word [17]. 
Similar to comments in YouTube, local online forums also contain a large amount 
of short Cantonese texts mixed with extra characters. In order to acquire more written 
Detecting Comments Showing Risk for Suicide in YouTube 387
Cantonese corpus, 4,310,566 written Cantonese posts were crawled from a popular 
local online forum [18]. 
2.3 Construction of Cantonese Dictionary 
Word segmentation is a very important part before text classi?cation. A good Can-tonese 
dictionary is important in doing word segmentation. Through combining 26 
Cantonese lexicons in Sogou [19], a popular text input software in China, we con-structed 
a Cantonese dictionary containing 597,731 Cantonese words. 
3 Text Preprocessing 
YouTube comments are mainly written in Cantonese. However, English is also a 
popular and of?cial language in Hong Kong, 9% of the total comments that we col-lected 
from YouTube are in English. To complete a full analysis, those English words 
were ?rst translated into Cantonese. 
3.1 Translation 
Because of lacking direct translation tool from English to Hong Kong Cantonese, two 
steps were made to translate English comments to Cantonese. First, English words were 
translated into simpli?ed Chinese using the Google Translate API [20] for Python. 
Second, Open Chinese Converter Project (OpenCC) [21] was used to convert simpli-?ed 
Chinese to Hong Kong Cantonese. OpenCC is an open source project for con-version 
between Traditional Chinese and Simpli?ed Chinese, supporting regional 
idioms in Mainland China and Hong Kong [22]. 
3.2 Filtering 
Stop words, by de?nition, are those words that appear in the texts frequently but do not 
carry signi?cant information [23]. Effective text mining can be achieved by removal of 
stop words. Cantonese and Mandarin Chinese are within the same language family, so 
their written forms share a number of words in common [15]. Due to the absence of 
Cantonese stop word dictionary, we used the Mandarin stop words dictionary to ?lter 
comments. Similar to English stop words, Chinese stop words are usually those words 
with part of speeches like adjectives, adverbs, prepositions, interjections, and auxil-iaries. 
Adverb “ ” (of), preposition “ ” (in), conjunction “ ” (because of) and 
“ ” (so) are some examples [23]. 
According to the guidelines for manual annotation, a comment would be labelled as 
non-risk if it only contains stop words, punctuations or emoji, because these simple 
terms cannot provide suf?cient information for the readers to assess suicide risk. 
Following this guideline, if a comment only contains these terms, it will be detected 
and classi?ed as non-risk comment at ?rst. For other comments, these terms will be 
removed ?rst and the remaining text will be classi?ed using the classi?cation models. 
388 J. Gao et al.
4 Text Classi?cation for Suicidality Detection 
4.1 Feature Representation 
It is a common way to represent a document using a vector. In this paper, we utilized 
the Jieba [24] segmentation tool and word2vec [25] model to acquire the sentence 
vector. 
Unlike English, Chinese sentences do not contain spaces. Therefore, words in a 
sentence cannot be detected by computer automatically in Chinese. Based on the 
Cantonese dictionary constructed in the last section, we conducted text segmentation 
using Jieba [24], a Chinese text segmentation tool, to split the sentence into words. 
The distributed representative of word in a vector space can group similar words 
better and help algorithms to achieve a better result. This paper used the word2vec 
model developed by Mikolov [25] for learning vector representations of words. We set 
the dimensionality of vectors as 100 and learned the word vectors from the huge dataset 
(4,310,566 Cantonese posts) collected from the local forum. Then, we averaged the 
word vectors in a comment document to acquire its document vector. 
Figure 1 shows the word ‘ (suicide in Traditional Chinese)’ and its 100 
neighbouring words according to the cosine similarity between word vectors. The 100- 
dimension word vector data were projected into 3 dimensions using the Principal 
Component Analysis (PCA). 
Fig. 1. Word vector visualization. 
Detecting Comments Showing Risk for Suicide in YouTube 389
4.2 Classi?er 
After ?ltering those comments only containing stop words, punctuations or emoji as 
non-risk data, the remaining comments need to be classi?ed. 
Both machine learning and deep learning methods are popular in text classi?cation. 
The paper used algorithms in both ?elds to detect whether a comment shows risk for 
suicide. 
Support Vector Machine (SVM). Support Vector Machine (SVM) has been shown to 
be highly effective at traditional text categorization [26]. This method searches for a 
hyperplane represented by a vector that can separate document vectors of two classes 
with maximum margin. 
AdaBoost. Adaptive Boosting (AdaBoost) aims at constructing a “strong” classi?er by 
combining a number of “weak” classi?ers [14]. The weights are proposed in AdaBoost 
to increase the importance of misclassi?ed data and decrease the importance of cor-rectly 
classi?ed data. Through combining these weak classi?ers based on their relative 
performance, AdaBoost can achieve an improved accuracy. 
Random Forest (RF). Random forest is a variant of bagging methods proposed by 
Breiman [27]. Similar to bagging, random forest constructs a decision tree for each of 
the bootstrap samples drawn from the data. But unlike bagging, random forest ran-domly 
selects a subset of predictors to determine the optimal splitting rule in each node 
of the trees in order to avoid over?tting [28]. 
Long short-term memory network (LSTM). Long Short-Term Memory network 
(LSTM) [29] is a special kind of recurrent neural network, capable of learning long-term 
dependencies. We trained the LSTM model based on words, using the pre-trained 
word2vec embedding layer with 100 dimensions. As shown in Fig. 2, The model is 
formed by taking mean of the outputs of all LSTM cells to form a feature vector, and 
then using multinomial logistic regression on this feature vector [30]. 
Topic seed words classi?cation model. The suicide-related comment data studied 
here are extremely imbalanced with a lot of non-risk comments. The paper designed a 
topic seed word classi?cation model to ?lter the non-risk comments at ?rst and then use 
the relatively balanced data to train the classi?er. First of all, the seed words [31] 
relating to suicide topic were summarized under the guidance of the suicide research 
experts. The seed words list is shown in Table 1. 
Based on the similarity of documents, if a document vector is far away from the 
seed list, it can be predicted as non-risk. We describe the similarity by the cosine 
similarity between a document and the seed list. 
Fig. 2. Long short-term memory. 
390 J. Gao et al.
In Fig. 3, the x-axis shows the cutoff value for cosine similarity below which a 
comment is predicted to be non-risk and the y-axis shows the misclassi?cation rate. We 
?nd that from 0.6 to 0.65, the misclassi?cation rate did not increase much until it has a 
sudden increase at cutoff = 0.7. As we use seed words here to ?lter out the non-risk 
comments, we decided to choose 0.65 as the cutoff value for the cosine similarity. If a 
comment’s cosine similarity from the seed list is smaller than 0.65, it will be classi?ed 
as non-risk and will be removed in the ?rst stage. The remaining comments will then be 
studied in the second stage for identi?cation of at-risk comments. 
Using 0.65 as cutoff to ?lter out the non-risk comments by the seed words, only 
0.22% comments in the training data were misclassi?ed. Table 2 shows the top 10 non-risk 
comments with the highest cosine similarity ?ltered by seed words. 
4.3 Loss Function 
There are two kinds of loss function used in this article. Cross entropy loss will be used 
when the model is trained by balanced dataset. Focal loss [16] will be used when the 
model is trained by imbalance dataset. 
Cross Entropy Loss. Cross entropy (CE) loss for binary classi?cation: 
CEðp; yÞ ¼ 
Þ logðpÞ 
if y 
¼ 
1 
f logð1 
f pÞ 
otherwise 
Þ 
Where, y 
2 f1g 
speci?ed the class and p 
2 ½0; 1 
is the model’s estimated 
probability for the prediction class. 
Table 1. Seed words 
Seed Words Translation in English 
Suicide (Simplified Chinese) 
Suicide (Traditional Chinese) 
will go die (Both Traditional and Simplified Chinese) 
Go die (Both Traditional and Simplified Chinese) 
Why I am a human being (Cantonese) 
Press (Traditional Chinese) 
Pressure (Simplified Chinese) 
Suffering (Traditional Chinese) 
End one’s life (Traditional Chinese) 
(continued) 
Detecting Comments Showing Risk for Suicide in YouTube 391
Table 1. (continued) 
Seed Words Translation in English 
End one’s life (Simplified Chinese) 
Jump off (Cantonese) 
Die (Both Traditional and Simplified Chinese) 
End (Both Traditional and Simplified Chinese) 
Vile (Traditional Chinese) 
Disgust (Traditional Chinese) 
Going to die (Both Traditional and Simplified Chinese) 
Want to die (Both Traditional and Simplified Chinese) 
Negative energy (Traditional Chinese) 
Cry (Cantonese) 
Very hard (Both Traditional and Simplified Chinese) 
Very tired (Cantonese) 
Cutting wrist (Cantonese) 
Jump off a building (Traditional Chinese) 
Jump off a building (Simplified Chinese) 
Cutting wrist (Mandarin) 
Cutting hand (Mandarin) 
Leave this world (Cantonese) 
Very stressful (Traditional Chinese) 
Super stressful (Traditional Chinese) 
Give up (Traditional Chinese) 
Heartbroken (Traditional Chinese) 
Jump off (Mandarin) 
Unhappy (Cantonese) 
Helpless (Traditional Chinese) 
Garbage (Both Traditional and Simplified Chinese) 
No hope (Cantonse) 
Pain (Both Traditional and Simplified Chinese) 
Collapse (Traditional Chinese) 
Don’t want to live (Cantonese) 
End one’s own life (Traditional Chinese) 
Want suicide (Traditional Chinese) 
End life (Traditional Chinese) 
(continued) 
392 J. Gao et al.
De?ne pt: 
pt 
¼ 
p if y 
¼ 
1 
1 
f 
p otherwise 
f 
Then, cross entropy can be rewritten: 
CEðp; yÞ ¼ CEðptÞ ¼ logðptÞ 
Table 1. (continued) 
Seed Words Translation in English 
Kill oneself (Traditional Chinese) 
Hopeless (Traditional Chinese) 
What is the point to live on (Traditional Chinese) 
Die (Both Traditional and Simplified Chinese) 
Better to die (Cantonese) 
Jumped (Cantonese) 
What is the meaning of life (Traditional Chinese) 
Kill (Traditional Chinese) 
Fig. 3. Misclassi?cation rate for various cosine similarity cutoff values. 
Detecting Comments Showing Risk for Suicide in YouTube 393
Focal Loss. To address this class imbalance problem, focal loss [16] was designed by 
reshaping the standard cross entropy loss such that it down-weights the loss assigned to 
well-classi?ed examples. The focal loss was de?ned as [16]: 
FLðptÞ ¼ 
atð1 
s 
ptÞc logðptÞ 
Where, at 
2 ½0; 1 
for positive class 1 and ð1 
1 aÞ 
for negative class. The tunable 
focusing parameter c h 0. at is introduced to balance the importance of 
positive/negative examples. The modulating factor ð1 
1 
ptÞc is added to balance the 
easy/hard examples (an example with large loss is de?ned as the hard example). 
4.4 Evaluation 
The aim of this paper is to predict whether a piece of YouTube comment is showing 
suicide risk. The confusion matrix, as shown in Table 3, is commonly used in clas-si?cation 
evaluation. 
Here, we take at-risk class as positive class. Our purpose is to ?nd the at-risk users 
and save as many lives as possible. So the costs of false positive and false negative 
predictions are not the same. A false positive prediction should be a serious matter as 
we might miss the chance to save a life. Besides, non-risk class is dominating in the 
data. Given such extremely imbalanced data, the error rate is no longer an appropriate 
performance measure [32]. 
Table 2. Selected comments ?ltered by seed words 
Cosine 
similarity 
1:at-risk 
0:non-risk 
Comment English Translation of 
the Comment 
0.6499 0 Actually you have a good 
point 
0.6498 0 Don’t feel sad 
0.6497 0 Come on try your best 
0.6495 0 We English teachers do a 
lot of homework 
0.6495 0 Why so many things are 
arranged in the same week 
0.6494 0 Thought there was 
something wrong 
0.6493 0 But believe we are the 
best 
0.6493 0 Have you thought how 
many scores you can get 
0.6490 0 Come on I believe you can 
do it 
0.6490 0 My mom forced me to 
take Belilios (Note: a 
school in Hong Kong) 
394 J. Gao et al.
In this paper, we use geometric mean of the accuracies (G-mean) [33] as perfor-mance 
measure: 
True Positive 
RateðAccþÞ ¼ 
TP 
TP þ FN 
True Negative 
RateðAccÞ ¼ 
TN 
TN 
þ FP 
G - mean 
¼ 
????????Þn      
Acc 
??????????????????.                  
þ 
c Acc 
??c  p 
G-mean is a popularly used performance evaluation measure in an imbalanced 
training data. The idea is to maximize the accuracy on each of the two classes while 
keeping these accuracies balanced [32]. For example, a high accuracy of negative 
examples with a low accuracy of positive examples will result in a poor g-mean value. 
5 Experiment and Results 
This paper performed suicide-related comment classi?cation based on both data and 
algorithm levels. 
5.1 Experimental Setting 
Experimental Data. The data crawled from YouTube consist of 5051 comments (251 
at-risk comments, 4800 non-risk comments), which were split into two datasets with 
80% of them for training and 20% for testing purpose. To tackle the imbalanced 
problem, we designed our model in two ways. One possibility is to apply under-sampling 
to randomly select a balanced training dataset so that it consists of 201 risk 
comments and 201 non-risk comments. Then the balanced dataset was used to train 
classi?ers using the cross-entropy loss as the loss function. Alternatively, we can use 
the raw imbalanced training dataset (3840 risk comments and 201 non-risk comments) 
to train classi?ers using the focal loss. 
Parameter Setting. This paper used the scikit-learn [34] library in Python to train 
SVM, AdaBoost, and Random Forest models; used the genism [35] tool in Python to 
Table 3. Confusion matrix 
Predicted positive Predicted negative 
Positive class True positive (TP) False negative (FN) 
Negative class False positive (FP) True negative (TN) 
Detecting Comments Showing Risk for Suicide in YouTube 395
train word2vec model; used the keras [36] framework in Python to train the LSTM 
model. Model parameters are shown in the following Table 4: 
5.2 Experimental Results 
Recall that once comments contain stop words, punctuations or emoji, they are all non-risk. 
Such comments in the training data will then be classi?ed as non-risk. Topic seed 
words classi?cation can also be used in advance to ?rst classify the non-risk comments 
to balance the dataset. Various classi?ers mentioned in Sect. 4 were trained for the 
remaining data. Finally, these methods were applied to the testing data and the testing 
results are shown in Table 5. 
Notice that using the under-sampling method, Set A consisting of 402 balanced 
comments was generated and used to train classi?cation models. 
It can be seen from Table 5 that the deep learning algorithm LSTM performed 
better than the traditional machine learning algorithms (SVM, AdaBoost and RF). 
The LSTM classi?er without ?ltering by the seed words performed the best, with 
84.3% g-mean. The ?lter of seed words did not have signi?cant impact on classi?cation 
Table 4. Model parameters 
Model Parameters 
SVM 
(RBF) 
kernel = ‘rbf’, C = 1.5, gamma = 0.05 
Adaboost Base_estimator = decision tree, n_estmator = 50, learning_rate = 1, 
algorithm = ‘SAMME.R’ 
Random 
forest 
max_depth = 5, n_estimators = 10, max_features = 1 
Word2vec size = 100, min_count = 5, sg = 1 
LSTM vocab_dim = 100 # output dimension in embedding layer 
batch_size = 32 # number of samples per gradient update 
n_epoch = 4 #number of epochs to train the model 
Table 5. Testing results of classi?cation based on improvement on data level (Testing data: 960 
at-risk comments and 50 non-risk comments) 
Feature extraction Classi?er G-mean (%) 
Set A CE loss SVM - no seed ?lter 78.3 
SVM - seed ?lter 78.4 
AdaBoost - no seed ?lter 79.2 
AdaBoost - seed ?lter 78.6 
RF - no seed ?lter 74.3 
RF - seed ?lter 69.7 
LSTM-no seed ?lter 84.3 
LSTM-seed ?lter 82.3 
396 J. Gao et al.
even though it performs well to balance training and testing comments. This is because 
using the under-sampling method, a balanced dataset was used to train the mode, when 
the seed word ?lter is not necessary. 
Given that the LSTM model performs well, this paper decides to solve the 
imbalanced problem in the algorithm level based on LSTM model. The raw imbalanced 
training dataset (Set B) without under-sampling was used to train the model. Here the 
focal loss was introduced in LSTM model (setting a 
¼ 
0:75; c 
¼ 
1). 
Due to the use of an imbalanced dataset to train the model, we cannot just use the 
0.5 cutoff to predict the comment’s risk level. Based on the training dataset, we choose 
the threshold which can achieve the highest g-mean as the model’s prediction cutoff. 
As shown in Table 6, with the topic seed word ?lter, the LSTM model with focal 
loss achieved 84.5% g-mean, which is slightly higher than the g-mean achieved by the 
LSTM with cross-entropy loss based on the balanced dataset (84.3% g-mean). 
Using the LSTM model and focal loss, the top 5 comments with highest predicted 
probability of risk was shown in Table 7. 
Table 6. Testing results of classi?cation based on improvement on algorithm level (Testing 
data: 960 at-risk comments and 50 non-risk comments) 
Feature extraction Classi?er G-mean (%) Cutoff 
Set B FC loss LSTM-no seed ?lter 81.8 0.20 
LSTM-seed ?lter 84.5 0.25 
Table 7. Comments with highest predicted probability 
(continued) 
Detecting Comments Showing Risk for Suicide in YouTube 397
6 Conclusion 
This paper compared the performance of different classi?cation algorithms based on the 
word vector features. Because the YouTube comments are actually in a sequential list, 
the LSTM which can learn the sequential information performs better than other 
machine learning algorithms. Based on the topic seed word classi?cation model and the 
improvement on loss function, it can achieve the best testing performance (84.5% g-mean). 
The focal loss was also effective in ?guring imbalanced text classi?cation 
problem. In addition, in terms of combination with under-sampling methods to classify 
comments, LSTM also performed better than other machine learning algorithms, 
reaching 84.3% g-mean. 
The study has pushed forward natural language processing with Cantonese, which 
is a complicated dialect mixed Traditional Chinese, borrowed characters to represent 
spoken terms, and English. It also demonstrates the potential of using machine learning 
Table 7. (continued) 
398 J. Gao et al.
methods to detect suicide risk in real social media settings. As suicide prevention is a 
battle against the clock, every minute saved in detecting suicide risk and alerting 
intervention can be crucial. However, it is challenging to employ staff to monitor and 
review online content 24/7. Based on the computerized algorithm, suicide professionals 
can scale up the real-time monitoring of online content to detect potentially at-risk 
posts, based on which more timely interventions can be implemented. 
Acknowledgements. The study was supported by Hong Kong General Research Fund (Ref No.: 
17628916). 
References 
1. Centre for Suicide Research and Prevention, The University of Hong Kong. https://csrp.hku. 
hk/statistics/. Accessed 30 Mar 2018 
2. World Health Organization Webpage. http://www.who.int/mental_health/suicide-prevention/
world_report_2014/en/. Accessed 30 Mar 2018 
3. Cheng, Q., Chen, F., Lee, E.S.T., Yip, P.S.F.: The role of media in preventing student 
suicides: a Hong Kong experience. J. Affect. Disord. 227, 643–648 (2018) 
4. Cheng, Q., Kwok, C.L., Zhu, T., Guan, L., Yip, P.S.F.: Suicide communication on social 
media and its psychological mechanisms: an examination of Chinese microblog users. Int. 
J. Environ. Res. Public Health 12(9), 11506–11527 (2015) 
5. Chan, M., et al.: Engagement of vulnerable youths using internet platforms. PLoS ONE 12 
(12), e0189023 (2017) 
6. Pestian, J.P., Matykiewicz, P., Grupp-Phelan, J.: Using natural language processing to 
classify suicide notes. In: Proceedings of the Workshop on Current Trends in Biomedical 
Natural Language Processing. Association for Computational Linguistics (2008) 
7. Huang, Y.-P., Goh, T., Liew, C.L.: Hunting suicide notes in web 2.0-preliminary ?ndings. 
In: Ninth IEEE International Symposium on Multimedia Workshops, ISMW 2007. IEEE 
(2007) 
8. Moreno, M.A., et al.: Feeling bad on Facebook: depression disclosures by college students 
on a social networking site. Depress. Anxiety 28(6), 447–455 (2011) 
9. O’Dea, B., Wan, S., Batterham, P.J., Calear, A.L., Paris, C., Christensen, H.: Detecting 
suicidality on Twitter. Internet Interv. 2(2), 183–188 (2015) 
10. Vioulès, M.J., Moulahi, B., Azé, J., Bringay, S.: Detection of suicide-related posts in Twitter 
data streams. IBM J. Res. Dev. 62(1), 7:1–7:12 (2018) 
11. Cheng, Q., Li, T.M.H., Kwok, C.L., Zhu, T., Yip, P.S.F.: Assessing suicide risk and 
emotional distress in Chinese social media: a text mining and machine learning study. 
J. Med. Internet Res. 19(7), e243 (2017) 
12. Kotsiantis, S.B.: Supervised machine learning: a review of classi?cation techniques. Emerg. 
Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007) 
13. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from 
imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004) 
14. Zhang, Z., Ye, Q., Li, Y.: Sentiment classi?cation of Internet restaurant reviews written in 
Cantonese. Expert Syst. Appl. 38(6), 7674–7682 (2011) 
15. Zhang, Z., Ye, Q., Li, Y., Law, R.: Sentiment classi?cation of online Cantonese reviews by 
supervised machine learning approaches. Int. J. Web Eng. Technol. 5(4), 382–397 (2009) 
16. Lin, T.-Y., et al.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 
(2017) 
Detecting Comments Showing Risk for Suicide in YouTube 399
17. Cheung, K.-H., Bauer, R.S.: The representation of Cantonese with Chinese characters. 
University of California, Project on Linguistic Analysis (2002) 
18. LIHKG Webpage. https://lihkg.com/category/30. Accessed 30 Mar 2018 
19. Sogou Webpage. https://pinyin.sogou.com/dict/search/search_list/%D4%C1%D3%EF/ 
normal. Accessed 30 Mar 2018 
20. Python Webpage. https://pypi.python.org/pypi/googletrans. Accessed 30 Mar 2018 
21. Python Webpage. https://pypi.python.org/pypi/OpenCC. Accessed 30 Mar 2018 
22. GitHub Webpage. https://github.com/BYVoid/OpenCC. Accessed 30 Mar 2018 
23. Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop 
word list. In: Proceedings of the 5th WSEAS International Conference on Applied Computer 
Science (2006) 
24. GitHub Webpage. https://github.com/fxsjy/jieba. Accessed 30 Mar 2018 
25. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of 
words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. (2013) 
26. Joachims, T.: Text categorization with support vector machines: learning with many relevant 
features. In: European Conference on Machine Learning (1998) 
27. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 
28. Liaw, A., Wiener, M.: Classi?cation and regression by randomForest. R. News 2(3), 18–22 
(2002) 
29. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computat. 9(8), 1735– 
1780 (1997) 
30. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text 
classi?cation. Adv. Neural Inf. Process. Syst. (2015) 
31. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th 
International Conference on Computational Linguistics. Association for Computational 
Linguistics (2004) 
32. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. 
IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009) 
33. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided 
selection. ICML, Vol. 97 (1997) 
34. Scikit-learn Webpage. http://scikit-learn.org/stable/. Accessed 30 Mar 2018 
35. Gensim Webpage. https://radimrehurek.com/gensim/models/word2vec.html. Accessed 30 
Mar 2018 
36. Keras Webpage. https://keras.io/models/sequential/. Accessed 30 Mar 2018 
400 J. Gao et al.
Twitter Analytics for Disaster Relevance 
and Disaster Phase Discovery 
Abeer Abdel Khaleq(&) and Ilkyeun Ra 
University of Colorado, Denver, CO 80204, USA 
{abeer.abdelkhaleq,ilkyeun.ra}@ucdenver.edu 
Abstract. Natural disasters happen at any time and at any place. Social media 
can provide an important mean for both people affected and emergency per-sonnel 
in sharing and receiving relevant information as the disaster unfolds 
across the different phases of the disaster. Focusing on the phases of pre-paredness, 
response and recovery, certain information needs to be retrieved due 
to the critical mission of emergency personnel. Such information can be directed 
depending on the disaster phase towards warning citizens, saving lives, or 
reducing the disaster impact. In this paper, we present an analytical study on 
Twitter data for three recent major hurricane disasters covering the three main 
disaster phases of preparedness, response and recovery. Our goal is to identify 
relevant tweets that will carry important information for disaster phase discov-ery. 
To achieve our goal, we propose a cloud-based system framework focused 
on three main components of disaster relevance classi?cation, disaster phase 
classi?cation and knowledge extraction. The framework is general enough for 
the three main disaster phases and speci?c to a hurricane disaster. Our results 
show that relevant tweets from different disaster data sets spanning different 
disaster phases can be classi?ed for relevancy with an accuracy around 0.86, and 
for disaster phase with an accuracy of 0.85, where key information for disaster 
management personnel can be extracted. 
Keywords: Twitter analytics
.e
Twitter data mining 
Social media classi?cation
.e
Disaster relevance classi?cation 
Disaster phase classi?cation
.e
Cloud-based analytics
.e
Disaster management 
1 Introduction 
Natural disasters are large scale in impact and many of them span multiple disaster 
phases. Some disasters need more focus on preparedness, some on response and some 
on recovery. It is necessary to direct each agency to its mission during a disaster based 
on the disaster phase. For example, warning systems and evacuation plans need to be in 
place during preparedness, medical personnel need to act during response, and relief 
agencies will provide shelters during recovery. Twitter provides a rich platform for key 
information during a disaster. Analyzing and extracting informational tweets from 
Twitter during disasters is one of the text mining researches in recent years [1]. 
However, Twitter data is highly unstructured and has a lot of noise and irrelevant 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 401–417, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_31
messages where identifying relevant tweets is a challenge [2]. There is a need to ?lter 
out those relevant tweets during the disaster phases and uncover insightful information. 
During a disaster, we may have massive number of disaster related tweets coming 
from many different sources carrying important disaster information. Our idea is to 
build a general system framework that can process the large number of the disaster 
related tweets and ?lter out the relevant ones that may carry important information and 
can be used for managing the disaster. From the collected disaster relevant Twitter data, 
the disaster phase, the disaster location and other key information will be extracted. Our 
system will be hosted in the cloud for storage and analytics processing capabilities and 
protecting from the potential loss of resources during a disaster. 
To accomplish our goal, we conducted an analytical study on Twitter data from 
three recent hurricanes disasters including hurricane Matthew from 2016, Harvey and 
Irma from 2017 across the three disaster phases of preparedness, response and recovery 
to have a well diverse and general data set. We chose hurricanes as a disaster type for 
our analytical study because they can be predicted, have a sustainable impact for 
response and recovery. Hurricanes are natural disasters that affect the US and other 
countries every year. They result in a great loss of civilians and cause a lot of damage 
and devastation that goes far beyond expectations. Many lives can be saved, and many 
resources can be sustained with minimal damage if the proper information can be 
delivered to the right personnel at the right time during the right phase of the disaster. 
This makes them applicable to our study where the three disaster phases of pre-paredness, 
response and recovery can be further identi?ed to provide the needed 
resources. 
Since each disaster phase has its own requirements and valuable information, it is 
important to distinguish between these phases and extract the right information for each 
phase. The contributions of our paper are as follows: 
(1) Provide a general cloud-based framework for Twitter data analytics in hurricane 
disaster management. 
(2) Identify relevant tweets during a disaster from different hurricane disaster data 
sets. 
(3) Classify the disaster phase of preparedness, response and recovery from relevant 
tweets. 
(4) Extract key knowledge from relevant tweets text such as location, key phrases and 
key terms that can be used by disaster emergency personnel. 
Our study is not geared toward creating new classi?cation algorithms. Rather it is 
limited to the use of existing classi?cation algorithms and methodologies to uncover 
the disaster relevance, the disaster phase and disaster key knowledge from the massive 
Twitter data that comes during a disaster. In this study we present our work on static 
hurricane Twitter data to build the classi?cation models, in future work we will 
implement the system on streaming real-time disaster data. 
The paper is organized as follows. Section 2 describes related work in Twitter 
disaster relevance, Sect. 3 describes the proposed Twitter analytics system framework 
along with the hurricane data sets used for the experiments, Sect. 4 presents the disaster 
relevance classi?cation experiment on the tweets, Sect. 5 presents the disaster phase 
discovery experiment on both labeled and unlabeled tweets, Sect. 6 describes the 
402 A. A. Khaleq and I. Ra
knowledge extraction experiment for the disaster location and other key information 
from relevant tweets, and ?nally Sect. 7 presents a conclusion and future work 
directions. 
2 Related Work 
It has been widely acknowledged that Humanitarian Aid and Disaster Relief (HADR) 
responders can gain valuable insights and situational awareness by monitoring social 
media-based feeds, from which tactical, actionable data can be extracted from the text 
[3]. Ashktorab et al. [4], for example, introduced Tweedr, a Twitter-mining tool that 
extracts actionable information for disaster relief workers during natural disasters. The 
Tweedr pipeline consists of three main parts: classi?cation, clustering, and extraction. 
Imran et al. [5] developed an arti?cial intelligence system for disaster response that 
classi?es real-time Twitter data into relevant disaster categories based on keywords 
hashtags. Imran et al. [6] performed disaster-relevant information extraction on Twitter 
data for both hurricane Sandy in 2012 and Joplin tornado in 2011. In their work they 
proposed a two-step method for disaster-related information extraction which are 
classi?cation of relevance and information extraction from tweets using off-the-shelf 
free software. In the same context, Stowe et al. [2] performed Twitter data classi?cation 
for relevance before, during and after the hurricane Sandy 2012 disaster. Their method 
was based on binary classi?cation for both relevance and ?ne-grained categories such 
as action, preparation, movement, etc. They concluded that tweets can be classi?ed 
accurately combining a variety of linguistic and contextual features which can sub-stantially 
improve classi?er performance. 
Those research areas address tweets classi?cation and ?ne-grain category classi?- 
cation during a disaster without identifying the disaster phase. Wang et al. [7] pointed 
out that most studies with exceptions of Haworth et al. [8] and Yan et al. [9] have 
focused on disaster response instead of other phases because of lack of data through 
those phases. This data sparsity problem in phases like, mitigation, preparedness and 
recovery may cause unreliable analytical results. They emphasized that future work is 
needed to overcome this limitation and effort needs to be directed toward gaining more 
useful information for all phases of disaster management through mining social media 
data. 
To the best of our knowledge, there is no work on establishing a general classi?- 
cation framework of Twitter data to classify the three main disaster phases of pre-paredness, 
response and recovery. Most of the research work is more focused on 
response and on the subcategories of ?ne-grained classi?cation. There is also a lack for 
a general hurricane disaster classi?cation framework, thus our work will focus on the 
characteristics of a disaster from the three shared phases of preparedness, response and 
recovery speci?c to a hurricane natural disaster. Our work is different on the following 
aspects: 
1. We propose a general hurricane disaster classi?cation framework based on three 
natural hurricane disaster datasets with accuracy as a measurement for classi?cation. 
Twitter Analytics for Disaster Relevance and Disaster Phase 403
2. We will identify relevant tweets based on textual context by manually examining 
and labeling the tweets and not using hashtags and keywords for a more general and 
accurate classi?cation. 
3. We will uncover the disaster phase of preparedness, response and recovery 
through classi?cation of relevant tweets with accuracy as a measurement for classi?- 
cation. We believe these three disaster phases can be founded easily in tweets related to 
natural disasters like hurricanes. 
3 System Framework and Data Set 
Our proposed system framework will have a Twitter analytics component for disaster 
relevancy and phase discovery specially tuned for hurricanes as part of a complete 
cloud-based platform for disaster management and response. This can serve as a 
foundation for a micro-service architecture where new components can be added, or 
existing ones can be updated for a new disaster phase or new requirements. As the 
focus of our study is on the Twitter analytics component, we plan on pursuing 
implementing the cloud-based framework in our future work. 
Fig. 1. System framework for Twitter analytics. 
404 A. A. Khaleq and I. Ra
Figure 1 provides the general system framework along with the Twitter analytics 
system workflow. Our focus in this study is on tweets texts for location and key 
knowledge extraction. The date and time of a disaster can be extracted from the 
created_at1 ?eld of the tweets and will be part of the complete framework imple-mentation 
of consecutive studies. 
Our work is focused on static Twitter data that was collected from recent hurricane 
disasters including hurricane Matthew, Harvey and Irma. All three disasters had sig-ni?cant 
impact on US and other areas with casualties and damage. As we are aiming on 
having a general classi?cation framework for a hurricane disaster, we sampled the data 
from three hurricanes to have a more general data set. We also made sure to diversify 
the data by covering the disaster phases of preparedness, response and recovery from 
each hurricane disaster. We identi?ed the disaster phase based on the disaster evolving 
date and time and the available hurricane information. We applied variable number of 
queries with geo-tagged and non-geotagged queries as our focus is on identifying 
relevance over a general data set using the different sets of disasters and different 
queries without adding any bias to certain tweets on the classi?er. We used Gnip2 for 
the historic Matthew data set and Twitter API streaming for the disasters of Harvey and 
Irma as they were unfolding. Table 1 provides a more detailed look at the data sets 
collected from the three hurricanes, listing the query used and the corresponding dis-aster 
phase. 
Table 1. Collected data sets for the three hurricanes 
Hurricane Date Query Disaster 
phase 
Number of 
tweets 
collected 
Matthew 10/7/2016 Track = (“Hurricane Matthew”) (flood OR wind OR 
storm OR heavy OR rain), no retweets, lang=‘en’ 
Preparedness 27,000 over 
the 3 days 
Matthew 10/8/2016 Track =((“Hurricane Matthew”) (flood OR wind OR 
storm OR heavy OR rain), no retweets, lang=‘en’ 
Response 
Matthew 10/9/2016 Track = (“Hurricane Matthew”) (flood OR wind OR 
storm OR heavy OR rain), no retweets, lang=‘en’ 
Recovery 
Harvey 8/25/2017 Bounding box including corpus Christi, san 
Antonio, west of Houston, Lang=‘en’, 
track=‘Hurricane Harvey’, no retweets 
Preparedness 7,728 
Harvey 8/28/2017 Bounding box around Houston area, Lang=‘en’, 
track=‘Hurricane, Harvey, flood, help, rescue, rain’, 
no retweets 
Response 121,658 
Harvey 8/30/2017 Lang=‘en’, track=Houston, no location, no retweets Recovery 61,940 
Irma 9/5/2017 Track=‘Hurricane Irma’, lang=‘en’, no retweets Preparedness 34,445 
Irma 9/10/2017 Bounding box around Florida, Track=‘irma’, 
lang=‘en’, no retweets 
Response 1,128 
Irma 9/11/2017 Track = ‘Hurricane Irma’,lang = eng, no retweets Recovery 9,099 
1 
Tweet object https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object. 
2 
Gnip http://support.gnip.com/. 
Twitter Analytics for Disaster Relevance and Disaster Phase 405
4 Disaster Relevance Classi?cation 
4.1 Disaster Relevance Annotation 
Our goal is to classify a general tweet during a hurricane disaster for relevance. We 
manually examined the data for quality of tweets texts. Manually labeled a sample of 
each disaster set over every phase for relevance. We examined the relevance of each 
Twitter text to the disaster phase. If the Twitter text contains any crucial information 
related to disaster phases such as “need”, water”, “evacuate”, “rescue”, we label it as 
relevant. If the tweet text does not carry any crucial information, we label it as non-relevant. 
For example, a relevant message is “Storm getting stronger: 2 million urged to 
leave” where it has information about evacuation that is important for preparedness. 
However, a message like, “We pray for those in the path of Hurricane Matthew. If you 
are in an area that may be affected by the disaster phases and…” will be labeled as non-relevant. 
It is important to point out that during this initial step we are classifying for 
relevance only and not for the disaster phase. As we cannot manually label the huge 
number of tweets across the three disaster sets, we randomly sampled a smaller data set 
of each. Table 2 shows the sampled data sets across the three disaster phases of the 
three hurricanes. Our initial plan was to sample the same number of tweets from each 
data set over each phase, but some data sets have a lot of noise and repetitions for some 
tweets which explains the lower number of tweets for some sets. However, we feel that 
we have captured the three disaster stages over a hurricane disaster with this sample 
data set as this is our focus. 
4.2 Relevance Classi?cation Model 
We have utilized Microsoft Azure machine learning studio3 to conduct our experiment 
as we plan on having a cloud-based framework in addition to the fact that Azure 
learning studio has a vast number of classi?cation models and text analytics models 
that can be easily tuned for performance. We combined the three data sets into one. 
Table 2. Sampled data set for disaster relevance classi?cation 
Disaster phase Hurricane Matthew Hurricane Harvey Hurricane Irma Total 
Preparation 200 Relevant 
200 Non-relevant 
200 Relevant 
157 Non-relevant 
200 Relevant 
106 Non-relevant 
600 Relevant 
463 Non-relevant 
Response 188 Relevant 
109 Non-relevant 
130 Relevant 
50 Non-relevant 
191 Relevant 
105 Non-relevant 
509 Relevant 
264 Non-relevant 
Recovery 171 Relevant 
74 Non-relevant 
31 Relevant 
16 Non-relevant 
126 Relevant 
110 Non-relevant 
328 Relevant 
264 Non-relevant 
Total 559 Relevant 
383 Non-relevant 
361 Relevant 
178 Non-relevant 
517 Relevant 
321 Non-relevant 
1437 Relevant 
927 Non-relevant 
3 
Microsoft Azure Machine Learning Studio https://azure.microsoft.com/en-us/services/machine-learning-
studio/. 
406 A. A. Khaleq and I. Ra
We cleaned and removed missing data based on text or other important ?elds which 
resulted in 2311 tweets, 1434 relevant and 877 non-relevant. We preprocessed the data 
by removing special characters, URLs and user mentions for privacy. We kept numbers 
as they are important for hurricane category, number of casualties, address, etc. We 
tokenized, stemmed and removed stop words. 
4.3 Binary Classi?cation Algorithm 
Stowe et al. [2] work showed that logistic regression with uni-gram features and cross 
validation achieved best accuracy on binary classi?cation for tweets relevance. 
Habdank et al. [10], pointed out that uni-gram achieves better accuracy than bi-grams 
in tweet text classi?cation for relevance as proved in other researchers experiments. We 
have also experimented with binary classi?cation algorithms in previous work on 
Twitter data including logistic regression, support vector machine, Naïve Bayes and 
Stanford classi?er. We found that logistic regression with uni-gram features gave us the 
best accuracy. We applied TF-IDF (Term Frequency Inverse Document Frequency) 
weighing function to uni-grams counts which adds weights of words that appear fre-quently 
in a single record but are rare across the entire dataset. We used ?lter-based 
feature selection to reduce the dimensionality and chose 1000 features with Chi-squared 
as a score function to calculate the correlation between the label column value 
and the text vector. We split the data 70% training and 30% testing. For parameter 
tuning, we split the testing data 50% for parameter tuning and 50% for scoring. We also 
used 10-fold cross validation to alternate between training and testing data and to assess 
both the variability of the dataset and the reliability of the training model. 
4.4 Evaluation Measurement 
Having a classi?er model that can accurately classify relevant tweets during an 
emergency is an important part of measuring the classi?er performance. Some tweets 
can be a matter of saving or losing a life if it has not been classi?ed correctly to be 
relevant. Habdank et al. [10] explained how accuracy and recall are very important 
evaluation measures. The higher the recall value the less relevant tweets have been 
falsely marked negative. Precision and F1 score are also other signi?cant measures. 
Precision measures false positives and F1 score is the weighted mean of both precision 
and recall. We focused in our experiment on accuracy as a main evaluation metric in 
addition to recall, precision and F1 score. 
Table 3 shows the logistic regression results across different feature hashing 
techniques. The best accuracy we got was around 0.86 using 10-folds cross validation 
and uni-gram with TF-IDF feature selection, which is slightly better than 0.856 
achieved by Stowe et al. [2]. This shows that tweets from multiple data sets over 
different disaster phases for a certain disaster type can be classi?ed for relevance with 
an accuracy similar and slightly better than one single data set which helps in building a 
classi?er that can be more general for a certain disaster type such as hurricanes. 
Twitter Analytics for Disaster Relevance and Disaster Phase 407
5 Disaster Phase Discovery 
Once the tweets are classi?ed for relevance, we need to identify the disaster phase from 
the relevant tweets. We focus our work on the three main disaster phases of pre-paredness, 
response and recovery as these are the main three phases where most of 
natural disasters will go through especially hurricanes. We have experimented with 
LDA (Latent Dirichlet Allocation) for topic discovery on unlabeled data and multi-class 
binary classi?cation on labeled data. The following sections describe our ?ndings. 
5.1 LDA for Disaster Phase Discovery on Unlabeled Data 
LDA uses a generative approach on unlabeled data. The algorithm generates a prob-abilistic 
model that is used to identify groups of topics which then can be used to 
classify either existing training cases or new cases. It uses the distribution of words to 
mathematically model topics [11]. The topic model gives us two major pieces of 
information for any collection of documents: (1) a number of topics which are con-tained 
within a corpus and; (2) for each document contained within the corpus, what 
proportion of each of the topics is contained within each document [12]. It is important 
to note that during a disaster usually tweets will be coming from one phase at a given 
time with some overlap. Based on this we are not using LDA to uncover the disaster 
phase as the disaster unfolds in real-time, we are rather identifying the disaster phase 
from static data to help in discovering disaster phase. Based on similar terms among the 
disaster phases across the three different disaster sets we can potentially label the data. 
In LDA, every topic is a collection of words. Each topic contains all the words in the 
corpus with a probability of the word belonging to that topic. LDA ?nds the most 
probable words for a topic, associating each topic with a theme is left to the user. 
The LDA approach requires careful validation of the topical clusters. 
Table 3. Results of binary classi?cation for disaster relevance 
Binary classi?cation Average 
accuracy 
Precision Recall F1 
score 
Two-class logistic regression uni-gram with 
TF-IDF cross validation 
0.858 0.868 0.90 0.886 
Two-class logistic regression unigram feature 
selection parameter tuning cross validation 
0..852 0.857 0.91 0.884 
Two-class logistic regression uni-gram with 
TF-IDF 
0.841 0.852 0.90 0.876 
Two-class logistic regression uni-gram with 
feature selection parameter tuning 
0.835 0.85 0.893 0.871 
408 A. A. Khaleq and I. Ra
We applied LDA in Azure machine learning studio on the relevant tweets. In LDA, 
an important parameter need to be identi?ed which is the number of topics. We 
experimented with few topics and different data sets to ?nd the best topic discovery for 
the disaster phases. When we applied LDA on one data set such as hurricane Irma that 
has the three disaster phases with topic = 3 and uni-gram we got good separation based 
on the disaster phase. Table 4 shows a sample of the results where we can identify topic 
1 for assessment and recovery, topic 2 for response, and topic 3 for preparedness and 
update. However, when we applied LDA on the general data set for the three hurricanes 
we got mixed results and as we increase the number of topics we can see the sub 
categories of the disaster emerge better such as warning, update, and death. We are 
convinced that LDA can be a good choice for identifying the disaster phase on one data 
set but does not perform well on a more diverse data set. 
5.2 Multi-class Classi?cation for Disaster Phase Discovery on Labeled 
Data 
As LDA did not perform well to accurately identify the three disaster phases, we 
applied multi-class classi?cation on the relevant tweets to classify the relevant tweets 
for a disaster phase. We combined data sets from the three different disasters covering 
the three disaster phases of preparedness, response and recovery to have a well-balanced 
data set. Only relevant tweets were taken, with a phase label 1 for pre-paredness, 
2 for response, and 3 for recovery. The data was labeled manually based on 
the disaster phase. We acquired a balanced data set with a total of 981 relevant tweets 
consisting of 327 tweets from each disaster phase across the three different disasters. 
The data was preprocessed in the same way we did our binary classi?cation, split 
into 70% training and 30% testing. We performed the experiment in Azure Machine 
Learning studio. We identi?ed several multi-class classi?cation algorithms to evaluate 
for accuracy based on recommendation from the work of Huang, et al. [13] and Azure 
machine learning [14]. The classi?ers were chosen based on their known high accuracy 
for multi-class text classi?cation. Table 5 provides the results of the multi-classi?ers on 
the data set. 
Table 4. Sample topics identi?ed from LDA on hurricane Irma data set 
Tweet text Topic1 Topic2 Topic3 
drone footage naples florida shows complete 
devastation hurricane irma 
0.997509 0.001245 0.001245 
hurricane irma 10 dead cuba record flooding hits 
northern florida latest news 
0.000831 0.998337 0.000831 
nc dps state ready hurricane irmas effects reach north 
carolina 
0.000997 0.000997 0.998006 
Twitter Analytics for Disaster Relevance and Disaster Phase 409
We can see that both neural networks with uni-gram feature hashing and parameter 
tuning along with two-class logistic regression with one-vs-all multi-classi?er gave an 
average accuracy of 85% and an average recall of 78%. 
Comparing our results to previous work on multi-class text classi?cation, Stowe 
et al. [2] performed binary classi?cation on the ?ne-grain subcategory of the disaster 
tweets and their best feature precision was around 0.71 and recall around 0.80. Huang 
et al. [13] applied logistic regression binary classi?cation on the ?ne-grained sub 
categories of the disaster and got an overall precision of 0.647 and recall of 0.711. Our 
results show that we can achieve an average accuracy of 0.85 on a more general 
disaster phase discovery rather than ?ne-grained sub categories. This shows that rel-evant 
tweets can be classi?ed for a disaster phase discovery with good accuracy. 
6 Knowledge Extraction 
6.1 Location 
After tweets are classi?ed for relevance and disaster phase, useful information need to be 
extracted. One main information is the location of the disaster. Tweets can be geo-tagged 
by the user to indicate where the tweet is coming from. This information is represented in 
the coordinates ?eld of the tweet which is in the form of a geoJSON (longitude ?rst, then 
latitude). For example: “coordinates”: {“coordinates”: [-“ 75.14310264, 40.05701649], 
“type”: “Point”}. The problem is not all tweets are geo-tagged, in our data set, for example, 
for both hurricane Matthew and Harvey in a data set of 1973 tweets, only 1% tweets are 
geo-tagged. Another ?eld that a user can share a location is the place ?eld which when 
present, indicates that the tweet is associated, but not necessarily originating from, a Place. 
In the same data set only 5% of the tweets are associated with a place. Extracting location 
from text will aid in identifying the main areas affected by the disaster [15]. In the 
following sections we present extracting tweets location from text, coordinates and place 
?elds of a tweet object. 
Table 5. Results of multi-class text classi?cation for disaster phase identi?cation 
Multi-classi?er algorithm Average 
accuracy 
Overall 
accuracy 
Micro-average 
precision 
Macro-average 
precision 
Micro-average 
recall 
Macro-average 
recall 
Neural networks uni-gram 
feature hashing parameter 
sweeping 
0.85 0.775 0.775 0.777 0.775 0.775 
Two-class logistic regression, 
with one vs. all multi-classi?er 
uni-gram feature hashing 
parameter sweeping 
0.85 0.775 0.775 0.775 0.775 0.775 
Multi-class decision forrest 
with feature hashing parameter 
sweeping 
0.845 0.768 0.768 0.77 0.768 0.768 
410 A. A. Khaleq and I. Ra
6.1.1 Text Based Location Extraction 
Our data set consists of 981 relevant tweets with 327 tweets from each disaster phase 
across the three different Hurricane disasters of Matthew, Harvey and Irma. Based on 
the lack of the tweet originating place in the coordinates ?eld, we examined the tweet 
text to extract the location. We applied the Named Entity Recognition module in Azure 
learning studio [16] which identi?es the names of things from text such as people, 
companies, locations, etc. Figure 2 presents the extracted location from the tweets text. 
We can see the extracted location names are associated with the hurricanes actual 
locations. For example, hurricane Matthew was targeting Florida, Haiti, North Carolina 
and South Carolina. Hurricane Harvey was targeting Houston, Texas and Hurricane 
Irma was targeting South Carolina, North Carolina and Florida. We can also identify 
the name of the hurricane such as Harvey, Matthew or Irma. This is a holistic approach 
where the disaster is happening, but for precise location, the geo-tagged coordinates 
?eld will give the exact address. 
6.1.2 Coordinates and Place Fields Location Extraction 
In this section we present a holistic approach to uncover a disaster location from the 
three tweet ?elds of text, coordinates and place and compare the results for consistency. 
Our data set consists of 121,658 tweets of Hurricane Harvey during the response phase. 
We extracted the latitude and longitude from the coordinates ?eld of the geo-tagged 
Fig. 2. Extracted locations from relevant tweets text for the three disasters Matthew, Harvey and 
Irma sampled data set. 
Twitter Analytics for Disaster Relevance and Disaster Phase 411
tweets and uploaded them on a Google maps for visual representation using Google 
Table Fusion. Figure 3 shows the geo-tagged tweets on Google map from the hurricane. 
Harvey dataset during the response phase and how they are mainly originating from 
Houston, TX the main affected area during the hurricane. 
The place ?eld in a tweet object consists of sub?elds such as country, country_- 
code, name, place_type all within a bounding box coordinates. We extracted those 
sub?elds on the tweets that the user decides to share a place with. Again, the place is 
not necessarily where the tweet is originating from. Figure 4 shows the cities names 
based on the place name ?eld in the same data set. Around 4000 tweets associate 
Houston with the tweet place. In addition, about 1500 tweets associate Texas with the 
tweet place. This indicates that the disaster place is associated with Houston, Texas. 
To compare our results with the text extracted location on the same data set, we 
applied the named entity recognition module in Azure on the tweets text which resulted 
in around 6000 mentions of Houston and about 4200 mentions of Texas as shown in 
Fig. 5. Comparing the results of coordinates, place and text ?elds extraction shows that 
they are consistent which con?rms that the disaster location is mainly affecting 
Houston, TX. 
Fig. 3. Coordinates of geo-tagged tweets of hurricane Harvey, response phase data set. 
412 A. A. Khaleq and I. Ra
Fig. 4. Extracted location from tweets place ?eld for hurricane Harvey, response phase data set. 
Fig. 5. Extracted location from tweets text for hurricane Harvey, response phase data set. 
Twitter Analytics for Disaster Relevance and Disaster Phase 413
We can also see from the results that many of the disaster locations names are 
coming from tweet text vs. place and coordinates con?rming that tweets text carry key 
information during a disaster phase. There can also be other ?elds to extract location 
such as the user pro?le which is not necessarily where the tweet is originating from, but 
the correlation can be further studied in future work. 
6.2 Key Knowledge Extraction 
For key knowledge extraction, we experimented with both term frequency and Key 
Phrase Extraction module in Azure [17]. Our data set consists of 981 tweets from the 
three-different disaster sets with a balanced distribution among the disaster phases 
totaling 33% tweets for each disaster phase of preparedness, response and recovery. 
For term frequency we created the matrix of terms using R in Azure for each 
disaster phase. The preparedness phase resulted in 693 key terms. Table 6 shows the 
top key terms for each disaster phase. For key phrase extraction, the module is a 
wrapper for natural language processing API for key-phrase extraction. The phrases are 
analyzed as potentially meaningful in the context of the sentence for various reasons 
such as if the phrase captures the topic of the sentence and if the phrase contains a 
combination of modi?er and noun that indicates sentiment. The output is a dataset 
containing comma separated key phrases in the text. Figure 6 gives the output of 
applying the module on the preprocessed data set for each of the disaster phases. 
Comparing the two outputs we can see the similarity among the key terms for each 
disaster phase. Utilizing the key phrases module gives us more meaningful phrases in 
high frequency for the disaster phase which will be very helpful for disaster personnel. 
The term frequency can be used as a complimentary module for veri?cation and for 
building a key term dictionary for the disaster phase. The hurricanes names and 
locations can be stripped off to generalize the dictionary terms for any disaster. 
Table 6. Key terms for each disaster phase based on tweet text term frequency 
Disaster 
phase 
Top key words in order 
Preparedness Hurricane, storm, Matthew, Harvey, Irma, Florida, category, Haiti, coast, Texas, 
disaster, death, wind, strengthen, toll, mph, dead, brace, deadly, surge, barrel, 
hit, near, Caribbean, news, atlantics, evacuation, head, immense, intensify, 
prepare, suffer, update, order, safe, threaten, approach, flee, flood, declare, 
expect 
Response Hurricane, Matthew, storm, Florida, Irma, flood, help, key, coast, surge, 
landfall, wind, Harvey, Houston, batter, category, power, Jacksonville, rain, hit, 
people, Carolina, foot, downgrade, feel, victim, weaken, need, help, rescue, kill, 
shelter, emergency, fear, relief, deadly, death, threaten, damage 
Recovery Hurricane, Matthew, storm, Irma, Carolina, north, state, flood, Florida, death, 
destruction, major, face, governor, rain, fatality, Houston, toll, damage, power, 
leave, surge, hit, devastation, cholera, expect, river, destructive, head, outbreak, 
cause, effect, collapse 
414 A. A. Khaleq and I. Ra
7 Conclusion and Future Work 
In this paper we proposed a general framework for a cloud-based Twitter analytics 
platform for disaster relevance identi?cation and disaster phase discovery. We exam-ined 
three major hurricanes and specially focused on studying three main disaster 
phases: disaster preparedness, disaster response, and disaster recovery. Our proposed 
system consists of three main components of Twitter analytics: relevance classi?cation, 
disaster phase classi?cation and knowledge extraction. Our experiment demonstrates 
that we can build a general classi?er with good accuracy around 86% to classify 
relevant tweets from a hurricane disaster. Disaster phase discovery using multi-class 
text classi?cation turns out to be a better choice for uncovering the three main disaster 
phases compared to LDA. LDA gives mixed results depending on the data set size and 
diversity. We were able to classify the disaster phase of preparedness, response and 
recovery using a multi-classi?er with an accuracy around 85%. Relevant tweets for a 
certain disaster phase carry important information for emergency management per-sonnel. 
We extracted the disaster location name from the tweet text and from the geo-tagged 
coordinates and place ?elds. As the number of geo-tagged tweets is usually very 
Fig. 6. Top key phrases for preparedness, response and recovery disaster phases in order from 
left to right. 
Twitter Analytics for Disaster Relevance and Disaster Phase 415
limited, the extracted text-based location becomes helpful in identifying the general 
location of a disaster. We have also extracted the key phrases and key terms for each 
disaster phase which can be used to uncover more ?ne-grained categories and poten-tially 
build a disaster phase key term dictionary. 
Our study is limited in scope to the use of existing classi?cation algorithms for 
Twitter text classi?cation of relevance and disaster phase discovery on hurricane static 
disaster data. We focused in our study on extracting meaningful disaster knowledge 
from tweets text. However, there is more disaster information that needs to be extracted 
including the disaster time, and the disaster scale for assessment and recovery. Novel 
approaches will be needed to uncover those areas from other tweets ?elds in addition to 
the text ?eld. 
As we continue working on this framework, we plan to have a general Twitter 
platform that can be utilized in a cloud-based disaster management application as a 
service. The platform needs to be general enough to allow for dynamic requirements 
update through micro-service architecture. Identifying relevant tweets in real-time is 
another goal as we plan on implementing the system for real time streamed data. We 
would like to test our work on various disasters from different domains which will help 
in discovering similarity among the different disasters and the disaster phases via key 
words or other similarity measures. Through our work, we also see a need for novel 
labeling mechanisms for Twitter data based on text context. Presenting the extracted 
information about the disaster in a user-friendly or standard format is another area to 
work on. 
Acknowledgements. Special thanks to Dr. Farnoush Banaei-Kashani, University of Colorado 
Denver. This work is supported by the Department of Education GAANN Program, Fellowship # 
P200A150283, focused on Big Data Science and Engineering. 
References 
1. Win, S.S.M., Aung, T.N.: Target oriented tweets monitoring system during natural disasters. 
In: 16th IEEE/ACIS International Conference on Computer and Information Science (ICIS), 
pp. 143–148. IEEE, Wuhan (2017) 
2. Stowe, K., Paul, M.J., Palmer, M., Palen L., Anderson, K.: Identifying and categorizing 
disaster-related tweets. In: The Fourth International Workshop on Natural Language 
Processing for Social Media, pp. 1–6. Association for Computational Linguistics, Austin 
(2016) 
3. Vieweg, S.E.: Situational awareness in mass emergency: a behavioral and linguistic analysis 
of microblogged communications. Doctoral dissertation, University of Colorado at Boulder, 
Boulder, CO (2012) 
4. Ashktorab, Z., Brown, C., Nandi, M., Culotta, A.: Tweedr: mining twitter to inform disaster 
response. In: Hiltz, S.R., Pfaff, M.S., Plotnick, L., Shih, P.C. (eds.) 11th Interna-tional 
ISCRAM Conference, pp. 354–358. The Pennsylvania State University, Pennsylvania 
(2014) 
5. Imran, M., Castillo C., Lucas J., Meier P., Vieweg, S.: AIDR: arti?cial intelligence for 
disaster response. In: 23rd International Conference on World Wide Web, pp. 159–162. 
ACM, Seoul (2014) 
416 A. A. Khaleq and I. Ra
6. Imran, M., Elbassuoni S., Castillo, C., Diaz, F., Meier, P.: Practical extraction of disaster-relevant 
information from social media. In: 22nd International Conference on World Wide 
Web, pp. 1021–1024. ACM, Rio de Janeiro (2013) 
7. Wang, Z., Ye, X.: Social media analytics for natural disaster management. Int. J. Geogr. Inf. 
Sci. 32(1), 49–72 (2018) 
8. Haworth, B., Bruce, E., Middleton, P.: Emerging technologies for risk reduction: assessing 
the potential use of social media and VGI for increasing community engagement. Aust. 
J. Emerg. Manag 30(3), 36 (2015) 
9. Yan, Y., Eckle, M., Kuo, C.L., Herfort, B., Fan, H., Zipf, A.: Monitoring and assessing post-disaster 
tourism recovery using geotagged social media data. ISPRS Int. J. Geo-Inf. 6(5), 144 
(2017) 
10. Habdank, M., Rodehutskors, N., Koch, R.: Relevancy assessment of tweets using supervised 
learning techniques: mining emergency related tweets for automated relevancy classi?cation. 
In: 4th International Conference on Information and Communication Technologies for 
Disaster Management (ICT-DM), pp. 1–8. IEEE, Münster (2017) 
11. Latent Dirichlet Allocation. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/latent-dirichlet-allocation. Accessed 02 Feb 2018 
12. Anastasopoulos, L.J., Moldogaziev, T.T., Scott, T.A.: Computational Text Analysis for 
Public Management Research: An Annotated Application to County Budgets (2017) 
13. Huang, Q., Xiao, Y.: Geographic situational awareness: mining tweets for disaster 
preparedness, emergency response, impact, and recovery. ISPRS Int. J. Geo-Inf. 4(3), 
1549–1568 (2015) 
14. Machine learning algorithm cheat sheet for Microsoft Azure machine learning studio. https:// 
docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-cheat-sheet. Accessed 
02 Feb 2018 
15. Spielhofer, T., Greenlaw R., Markham, D., Hahne, A.: Data mining Twitter during the UK 
floods: investigating the potential use of social media in emergency management. In: 3rd 
International Conference on Information and Communication Technologies for Disaster 
Management (ICT-DM), pp. 1–6. IEEE, Vienna (2016) 
16. Named Entity Recognition. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/named-entity-recognition. Accessed 02 Feb 2018 
17. Extract key phrases from text. https://docs.microsoft.com/en-us/azure/machine-learning/ 
studio-module-reference/extract-key-phrases-from-text. Accessed 02 Feb 2018 
Twitter Analytics for Disaster Relevance and Disaster Phase 417
Incorporating Code-Switching and Borrowing 
in Dutch-English Automatic Language 
Detection on Twitter 
Samantha Kent(&) and Daniel Claeser 
Fraunhofer Institut FKIE, Fraunhoferstrasse 20, 53343 Wachtberg, Germany 
{samantha.kent,daniel.claeser}@fkie.fraunhofer.de 
Abstract. This paper presents a classi?cation system to automatically identify 
the language of individual tokens in Dutch-English bilingual Tweets. A dic-tionary-
based approach is used as the basis of the system, and additional features 
are introduced to address the challenges associated with identifying closely 
related languages. Crucially, a separate system aimed speci?cally at differenti-ating 
between code-switching and borrowing is designed and then implemented 
as a classi?cation step within the language identi?cation (LID) system. The 
separate classi?cation step is based on a linguistic framework for distinguishing 
between borrowing and CS. To test the effectiveness of the rules in the LID 
system, they are used to create feature vectors for training and testing machine 
learning systems. The discussion centres are based on a Decision Tree Classi?er 
(DTC) and Support Vector Machines (SVM). The results show that there is only 
a small difference between the rule-based LID system (micro F1 = .95) and the 
DTC (micro F1 = .96). 
Keywords: Code-switching
.1
Borrowing
.1
Dutch
.1
English
.1
Twitter 
Machine learning
.1
Decision trees
.1
SVM 
1 Introduction 
In the European Union, it is estimated that just over half of all European citizens are 
able to speak at least one other language in addition to their mother tongue [1]. Online 
micro-blogging platforms such as Twitter provide the perfect setting for multilingual 
communication, and Tweets containing Dutch and English, as in (1) below, are not 
uncommon. 
(1) oke give me some reasons waarom jij denkt dat het real is 
ok give me some reasons why you think it’s real 
Currently, multilingual communication poses a challenge for Natural Language 
Processing (NLP) tasks such as Part-of-Speech tagging, machine translation, and 
Named Entity Recognition. Improving the ability to process multilingual communi-cation 
is vital, as it will contribute to further solving these tasks. 
Automatic language identi?cation (LID) is the task of determining the language of 
a document, sentence or word. Language identi?cation at Tweet level reaches accuracy 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 418–434, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_32
levels of over 95% for many languages. Nevertheless, one of the reasons the language 
of a Tweet is incorrectly identi?ed, aside from the marked Twitter language, is because 
they can contain code-switching. Code-switching (CS) is de?ned as “the alternation of 
two languages within a single discourse, sentence or constituent” [2]. CS can consist of 
multi-word utterances or single-word insertions. To determine whether or not a Tweet 
contains multiple languages, an analysis at token-level needs to be conducted. 
While there are many different LID methods, arguably, one of the simplest 
approaches is based on the use of a lexical lookup system. In this method, dictionaries, 
which are lists containing lexical items extracted from a particular language, are used to 
verify that a word is part of the lexicon of that language. This method was used as a 
starting point to identify the language of tokens in Spanish-English, German-Turkish, 
and Dutch-English Tweets [3]. The results suggested that the performance of a 
dictionary-based LID system is much better for language pairs that are not as closely 
related as Dutch-English. In the case of Dutch and English, many Dutch words are 
borrowed from English and have been integrated into the Dutch lexicon. The challenge, 
therefore, lies in determining whether the English words are in fact borrowed and part 
of a monolingual Tweet, or if they are English words (CS) that are included in a 
multilingual Tweet. Without distinguishing between these two types of words, it is very 
dif?cult to accurately identify the language of tokens in sentences that contain both 
English and Dutch. 
Thus, in order to address this issue, this study seeks to present a method for 
distinguishing between borrowed and code-switched English words in order to improve 
the overall language classi?cation of tokens in Dutch-English Tweets. To do so, the 
method in this paper combines a LID system based on a dictionary lookup with a 
synonym detection method that identi?es whether the token in question is code-switched 
or borrowed. Even though “words are seldom exactly synonymous” [4], 
comparing the use of a token and its possible synonyms provides an indication as to 
how a token is integrated into a language. 
2 Code-Switching and Borrowing 
To fully understand CS, a distinction between CS and lexical borrowing needs to be 
made. Lexical borrowing is de?ned as “the incorporation of lexical items from one 
language in the lexicon of another language” and is, together with CS, one of the more 
prominent language contact phenomena [5, p. 189]. CS and borrowing are closely 
related in the sense that lexical items that were once classi?ed as foreign word CS may 
be absorbed into the lexicon of a host language over time [6]. Example (2) below 
illustrates that it is not always so easy to determine whether a word should be identi?ed 
as a foreign word or not. 
(2) ik heb een video klaarliggen… een social test met mn docent, wanneer moet die 
online? 
I have a video ready to go… a social test with my teacher, when should it go 
online? 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 419
At ?rst glance, it would seem as though ‘social’, ‘test’ and ‘online’ are all English 
words in this sentence. In fact, according to the Woordenlijst Nederlandse Taal,
1 
the 
only word that is actually an English word is ‘social’, as the Dutch equivalent of this 
word is ‘sociaal’. The other two words are identical to English, but are also a part of the 
Dutch lexicon. They should, therefore, not be identi?ed as code-switching but instead 
as borrowing. 
Numerous attempts have been made to distinguish between borrowing and code-switching. 
They range between establishing a set of speci?c criteria with which to 
identify borrowing and CS, to the assertion that there is no clear-cut distinction 
between the two. In the ?rst view, one of the main distinguishing features between the 
two is the number of words. Lexical borrowings consist of only one word, whereas CS 
can consist of multiple words [7]. Having said this, the dif?culty in distinguishing 
between the two does not lie in the difference between single word lexical borrowings 
and multi-word alternations, but rather between lexical borrowing and single word CS 
inclusions. Table 1 provides a set of criteria to establish whether foreign inclusions can 
be classi?ed as borrowing or CS [7]. These criteria are used as guidelines to differ-entiate 
between the two phenomena. 
By delineating these criteria, the impression is given that there are only two pos-sibilities 
to classify a single word inclusion: CS or borrowing. However, it is argued 
that this strict separation of the two phenomena is not always possible and there are 
many exceptions that do not fall into either category. Instead of strictly differentiating 
between the two, CS and borrowing could be viewed as a continuum where the 
canonical forms of CS and borrowing are placed at either end of the spectrum [8]. This 
continuum makes it possible to account for tokens that may not be precisely in either 
stage, but are instead transitioning into becoming fully-fledged loanwords. 
The de?nition of borrowing that will be adopted in this paper is that borrowed 
words are words that stem from a foreign language and have been integrated into the 
lexicon of a native language. In contrast, words that are classi?ed as code-switching are 
not integrated. Rather than having to de?ne a frequency at which a token is either 
automatically classi?ed as CS or borrowing, the approach taken here relies on the 
difference between the frequency for the token and any possible alternatives in the 
native language. This ensures that instead of having to assign an arbitrary value, the 
unique difference between the tokens determines whether a word is CS or borrowing. 
3 Related Work 
Code-switching in Tweets was the topic of the shared task for the workshops on 
computational approaches to code-switching during the conference on Empirical 
Methods in Natural Language Processing (EMNLP) in 2014 and 2016. CS detection 
methods ranged from deep learning algorithms to traditional machine learning 
approaches and various dictionary-based approaches [9, 10]. The best result was 
1 
Woordenlijst Nederlandse Taal is a word list that contains the correct spelling of current Dutch 
words. It is maintained by de Taalunie http://woordenlijst.org/. 
420 S. Kent and D. Claeser
obtained by [11] for Spanish-English with an F1 score of 91.3%. The performance of 
the submissions for the Arabic language pair ranges from an F1 of 66% to an F1 of 
83% for the system with the best performance [12]. The results suggest that the more 
similar a language pair is, the more dif?cult it is to accurately detect CS. 
To the best of our knowledge, there are currently only two studies that present a 
method of automatically identifying CS and borrowing on social media. Neither study 
incorporated their results into a LID method. The ?rst one focused on English-Hindi 
CS and on developing a method that automatically detects whether a foreign language 
inclusion is CS or borrowing [13]. The method used is similar to the one in this paper, 
as the starting point is also the assumption that it is possible to distinguish between CS 
and borrowing by looking at the distribution of use of a foreign word in a native 
language. They achieve this by looking at the frequency of use of a token in a 
monolingual Hindi newspaper. Alternatively, [14] propose three different metrics to 
measure word usage: The Unique User Ratio (UUR), The Unique Tweet Ratio 
(UTR) and Unique Phrase Ratio (UPR). The results are that the overall micro 
precision/recall is 0.33 for the UUR metric, compared to a baseline of 0.19 established 
in [13]. 
It is clear from previous studies that multilingual text within one Tweet still pro-vides 
a challenge for automatic language detection. The systems described above cite 
similar reasons for the misclassi?cation of certain tokens. Firstly, the highly informal 
nature makes it dif?cult to capture the language of all tokens in a Tweet. A second 
reason misclassifying occurs is because the presence of named entities complicates the 
LID task [15]. Thirdly, words that share the same spelling in both languages are 
dif?cult to detect [15, 16]. This particular challenge seems to increase the more similar 
the languages in the language pair are. It seems as though it is more dif?cult to detect 
the language of tokens if there is a high level of lexical overlap. 
4 Resources 
A Dutch-English code-switching corpus was created for the purpose of training and 
testing the classi?er and was compiled with the aim of collecting as many Dutch 
Tweets containing English CS as possible. The corpus was compiled using the search 
function in the Twitter streaming API and both a speci?c language setting, Dutch, and 
Table 1. Characteristics of borrowing and CS [5, 7]. 
Criteria Borrowing Code-switching 
No more than one word + -n
Phonological adaptation + ± 
Morphological adaptation + -n
Syntactic adaptation + -n
Frequent use + -s
Replaces own word + -w
Recognized as own word + -w
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 421
speci?c search words were used to ?nd Tweets containing Dutch English code-switching. 
The top 25 most frequently used Dutch words on Dutch Wikipedia, con-sisting 
solely of grammatical function words, were used as search terms. 
The language identi?cation method presented in [3] was used to make a pre-selection 
of Dutch Tweets that are likely to contain English tokens. Based on these language tags, 
all Tweets with only Dutch or English tokens were separated from the Tweets that 
contain both Dutch and English tokens. It was necessary to select not only Tweets that 
were correctly identi?ed by the LID system as CS, but to also include Tweets that were 
incorrectly identi?ed so as not to introduce a bias. Therefore, some Tweets in the corpus 
contain only Dutch words that were mistakenly identi?ed as English and are used to test 
the classi?ers ability to recognize code-switched and borrowed tokens. The authors 
manually selected 1250 Tweets for annotation. The following four categories were used 
in the manual annotation of the Tweets: 
• Dutch (NL) – This category consists of Dutch words. It also includes all Dutch 
words that are borrowed from English. Particular attention is paid to the annotation 
of borrowed words, and because they are often overlooked and easily incorrectly 
annotated as English, these words were double checked in the Dutch word list. 
• English (EN) – All English words are labelled as English. If there is doubt about 
whether a word is English or Dutch, the same criteria as described in the Dutch 
category are applied. 
• Social Media Token (SMT) – It proved useful to create a separate category for all 
social media related tokens [16]. It includes all tokens that are speci?cally related to 
Twitter, such as at-mentions containing people’s usernames, hashtags and URLs, 
but it also includes tokens such as ‘hahahah’, ‘lol’ or ‘aww’. 
• Ambiguous (AM) – This category includes tokens that cannot be categorized as 
belonging to a particular language. Similarly to the SMT category, the tokens are 
used by both languages and are thus considered to be language independent. For 
example, company names such as Twitter or Google as well as the names of places 
and people, are categorized as ambiguous. 
The annotation was conducted by a native speaker of both Dutch and English, and a 
second native speaker annotated 100 randomly selected Tweets to check the accuracy 
of the annotation. A comparison of the Tweets annotated by both annotators shows a 
high inter-rater agreement (Cohen’s Kappa = 0.949). 1000 Tweets were used as 
training material and 250 Tweets were used to test the classi?er. An overview of the 
distribution of Tweets in the training and testing sets is given in Table 2 below. Note 
that while the category ambiguous (AM) has been included for the purpose of com-pleteness, 
it is not taken into account in any further classi?cation or analysis. 
The synonym dictionaries used in the LID system stem from three different sources. 
The ?rst dictionary was obtained from Open Dutch WordNet [17]. Open Dutch 
WordNet is a lexical semantic database containing 117914 synonym sets, of which 
51588 sets contain at least one Dutch synonym. The second dictionary is from a Dutch 
language foundation called Open Taal.
2 
They provide language resources for the 
2 
http://data.opentaal.org/opentaalbank/woordrelaties/. 
422 S. Kent and D. Claeser
creation of Dutch language software. The ?nal dictionary was created using Dutch 
Wiktionary.
3 
The synonyms for each of the Dutch entries in the dictionaries were 
extracted and used to compile a speci?c synonym dictionary. The addition of multiple 
synonym dictionaries not only increases the number of synonym sets but also means 
that entries can be cross-checked. 
The word frequency dictionaries were created using the Wikipedia dumps for 
Dutch and English (version: “all pages with edit history” on 01/03/2017). This par-ticular 
version contains the pages itself and a user discussion section where Wikipedia 
users may comment on the page content. This means that the dictionary contains both 
formal and informal language, as well as a wide range of vocabulary from different 
topics. The word list was created by stripping the raw input of all special characters, 
tokenizing the sentences, and sorting the tokens according to their rank. The rank lists 
were cut at ?ve million types because any words that are lower down on it consist of 
single words with a frequency of one. 
The Social Media Token (SMT) word list consists of a combination of different 
elements. The SMT list provided in [16] forms the basis of the list used here, which is 
supplemented by two additional resources. Firstly, the addition of an emoticon list from 
Wikipedia allows tokens such as “xD” to be captured. Secondly, a list of onomatopoeic 
words, such as ‘haha’ ‘pff’, retrieved from the training corpus was also added. To 
ensure that as many of these tokens as possible are identi?ed as SMT, the list is 
extended to include various different forms of the same token. This means that 
alongside ‘haha’ and ‘pff’, ‘hahahah’ and ‘pffff’ were also added. 
5 Classi?cation 
In this section, the classi?cation process is described. Section 5.1 contains an overview 
of the rule-based system, whereas Sect. 5.2 describes how the features derived from the 
classi?cation rules are extracted for use in various machine learning classi?ers. 
Table 2. Number of tokens in each of the four categories in the annotated Tweet training and 
testing sets 
Category No. of tokens in training set No. of tokens in testing set 
Dutch (NL) 73% (n = 10637) 73% (n = 2680) 
English (EN) 15% (n = 2220) 17% (n = 612) 
Social Media Token (SMT) 9% (n = 1281) 9% (n = 341) 
Ambiguous (AM) 3% (n = 438) 1% (n = 41) 
Total 14576 3674 
3 
https://nl.wiktionary.org/wiki/Hoofdpagina. 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 423
5.1 Rule-Based LID System 
The notion of word frequency plays a central role in the design of the system. It is 
assumed that the Dutch and English word frequency dictionaries are large enough for 
all tokens to be present in both dictionaries. Crucially, the rank of the token will be 
different as it will be more frequent in the language of origin compared to the other 
dictionary. Thus, in the ?rst step of the LID system, a token is assigned a language tag 
based on whether the rank of the token is higher in the Dutch or English dictionary. In 
the rare instance that a token is not present in either of the dictionaries, it is assigned the 
tag ‘none’. As a ?nal step, all none tags are tagged as the majority language (NL) of the 
Tweet. 
Aside from the binary classi?cation of either Dutch or English, tokens that are 
speci?c to social media also need to be taken into account. Tweets contain many 
additional tokens, such as @-mentions, hashtags, and abbreviations, which do not 
strictly belong to either of the two languages. To account for these tokens, an additional 
rule containing Social Media Tokens (SMT) is introduced. Once the initial classi?- 
cation based on the rank information is made, an additional lookup is performed in an 
SMT word list. Without this list, almost all of the SMT tokens would be tagged as 
English, simply because they are more frequent in the English rank dictionary com-pared 
to the Dutch one. All tokens present in this SMT list are tagged as such and are 
excluded from any further steps or rules in the LID system. 
The lexical overlap between Dutch and English means that it is challenging to 
capture the language of tokens that are orthographically identical in both languages. 
For example, the word “school” is used in both Dutch and English and should therefore 
also be classi?ed as such. However, if the word “school” has a rank of 615 in Dutch 
dictionary and a rank of 325 in the English dictionary, the classi?er will tag the word as 
English. If the LID system were to just consist of a basic dictionary lookup without any 
additional rules, all Dutch occurrences of the word would be misclassi?ed. 
In order to account for these tokens, two additional rules have been incorporated 
into the classi?er. The ?rst additional rule is the inclusion of a synonym detection 
method to determine whether a token is code-switched or borrowed. To start, the token 
that is being classi?ed is matched to an equivalent synonym in the Dutch synonym 
dictionary. If there is no match for the token, and therefore no synonym, the token is 
classi?ed as English. If there is a match, the token is classi?ed as Dutch in the fol-lowing 
two conditions: 
• If the rank of the original English token is higher than that of the selected synonym 
in the Dutch word frequency dictionary, the token is tagged as Dutch and therefore 
is borrowed. For example: ‘soul’ (rank = 6914) vs. ‘ziel’ (rank = 7291). 
• If the difference in ranks between the original English synonym and the selected 
synonym is less than 30.000, the token is tagged as Dutch and therefore is also 
borrowed. For example: ‘power’ (rank = 4092) vs. ‘macht’ (rank = 1316). 
The maximum rank distance is iteratively determined to be 30.000 using a list of 
English words that could potentially be borrowing or CS from the training data. To 
select the corresponding synonym the original English token is compared to each of the 
synonym sets, and if the token is present in a set, its synonyms are added to a match 
424 S. Kent and D. Claeser
list. Once the match lists have been created, the correct synonym is selected using a 
process of elimination. In the ?rst step, the synonym that occurs most frequently as a 
synonym match is selected. Secondly, if there is a tie, the synonym with the highest 
rank in the Dutch language dictionary is selected. The information obtained from the 
synonym dictionaries only outweigh the frequency information gained in step one if 
there is an actual synonym match. Otherwise, the classi?er assigns the original tag. 
The second additional rule considers the context of a token. It applies to tokens 
where the token is in one language and the preceding and the following token are in 
another language. In these cases, the token is assigned a language tag that matches the 
language of the surrounding tokens. For example, if token ‘n’ is Dutch and tokens 
‘n -n 1’ and n + 1’ are English, it is possible that the middle token ‘n’ is, in fact, 
English and should be reassigned as such. An essential addition to this rule is that it 
only comes into effect when the ranks of the token are suf?ciently similar in the Dutch 
and English frequency dictionaries (Fig. 1). 
If a maximum rank distance is not set, all tokens will be reassigned to match their 
context and all one-word code-switches could be incorrectly classi?ed and lost. After a 
distance of 1000 ranks, English recall starts to decrease considerably. Therefore, in 
order to optimize the identi?cation of the English tokens, the rank distance has been set 
to the maximum of 1000. To summarize, the steps in the LID system are as follows: 
• Base rule: dictionary lookup using the rank information in the Dutch and English 
Wikipedia dictionaries. 
• Base rule: SMT lookup. 
• Additional rule 1: Synonym dictionary lookup. 
• Additional rule 2: The context rule. 
5.2 Machine Learning 
The four steps in the LID system have been converted into numeric vectors to use as an 
input for the classi?ers in scikit-learn 0.18. This allows the system to be tested in a 
formal classi?cation framework and be exported for further use. The resulting vector 
has four different features: rank EN, rank NL, SMT, synonym rank, each corresponding 
to the information derived from the rule-based LID system described in Sect. 5.1. 
Rank EN, rank NL and synonym rank are all integers containing the absolute ranks 
retrieved from the language dictionaries. For the SMT token, we converted the Boolean 
‘present/absent’ in a social media token list to either returning an integer of 0 or 1. 
A second variation of the vector was also tested. The absolute synonym rank infor-mation 
was replaced with the difference in ranks between the token in question and its 
corresponding synonym. All other vector dimensions remained the same. The differ-ence 
between the ?rst and second version of the vector is that in the ?rst the difference 
in ranks between the token and the synonym are returned implicitly. The information is 
inherent in the synonym rank and the rank of the Dutch token and is thus already in the 
vector. In the second version, the difference in ranks is explicitly added as a feature. 
This distinction was made to allow the classi?ers to be trained on different information 
and to see if they would learn the rank difference without being explicitly given the 
information. We trained and tested eight different classi?ers using 10-fold cross 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 425
validation, the results of which can be found in Table 3 below. The two best classi?ers 
will be discussed in more detail in the following section. 
6 Evaluation 
In this section, the results for LID system and the best performing classi?er, the 
Decision Tree classi?er, are presented. Additionally, in Sect. 6.2, the code-switching 
and borrowing detection rule is evaluated separately. 
6.1 General Evaluation 
The LID system and the Decision Tree Classi?er were evaluated on a held-out set of 
250 Tweets. The results are very similar. The precision, recall and F1 for the individual 
categories, NL, EN, and SMT, in the Decision Tree classi?er, are shown in Table 4 
below. The best result is NL, with an F1 of 97.19%, followed closely by SMT and EN 
that have an F1 of 96.47% and 88.73%. Compared to the LID system, both precision 
and recall for the NL and EN improved. The overall F1 scores for the LID system and 
the DTC are 94.66% and 95.69% respectively, which is a signi?cant improvement 
compared to the baseline (F1 = 85.29%) for Dutch-English CS detection in Claeser 
et al. [3]. Both systems illustrate that it is easier to identify Dutch, the main language of 
the Tweets, although there is an improvement in the classi?cation of the EN tokens in 
the DTC. All ?gures for the DTC do not include any post-processing, since the effect of 
the context rule on the output of the classi?er was below the variance of the results of 
different test splits within cross-validation. 
The confusion matrix in Table 5 provides the misclassi?ed tokens for the DTC. 
Most errors stem from tokens that should have been classi?ed as either NL or EN. 
Fig. 1. Dutch and English precision and recall with differing maximum rank distance. 
426 S. Kent and D. Claeser
The SMT tokens are rarely misclassi?ed, and if they are it is because a token is a more 
unusual version of an SMT token already present in the SMT list. 
One of the largest sources of errors consists of Dutch tokens that should have been 
classi?ed as English. This includes tokens such as ‘god’, ‘pianist’, ‘pressure’, and 
‘dreaming’. There are two main types of errors. Firstly, single word inclusions were 
misclassi?ed due to the context in which they appeared. For example, ‘god’ and 
‘pianist’ are part of the Dutch and English lexicon, and were misclassi?ed in these 
cases because they were used in an English context but classed as borrowed (NL) by 
the inclusion of a synonym rank. Secondly, tokens have been misclassi?ed because the 
matched synonym is incorrect. A manual inspection of the tokens and their selected 
synonyms shows, for example, that the synonym that was selected for ‘love’ is ‘rose’. 
While these tokens are related in some way, they cannot be considered to be synonyms 
of one another. However, because ‘love’ is more frequent than ‘rose’, it is automati-cally 
classi?ed as being a borrowed (NL) word because the English token is more 
frequent than its supposed Dutch synonym. 
Another source of errors is English tokens that should have been classi?ed as Dutch. 
In most cases, they were not detected as borrowed words by the classi?er. One of the 
main reasons is that for these tokens the synonyms were not included in any of the three 
external synonym dictionaries. For example, ‘respect’, ‘defect’, ‘story’, ‘highlight’ and 
‘trends’ are all part of the Dutch lexicon, but have been classi?ed as English. The second 
reason for misclassi?cations is the inclusion of multi-word code-switched segments. 
Table 3. Classi?er Performance Micro F1 
Classi?er Micro F1 
Decision tree classi?er 0.9537 
Support vector machine 0.924 
Ada boost classi?er 0.9096 
Linear discriminant analysis 0.8187 
Quadratic discriminant Analysis 0.8186 
Logistic regression 0.7729 
Neural network 0.7503 
Table 4. P, R, F1 for the individual categories in the DTC and LID system 
Language Precision (%) Recall (%) F1 (%) 
Decision tree classi?er 
Dutch (NL) 97.16 97.23 97.19 
English (EN) 88.58 88.87 88.73 
Social Media Token (SMT) 97.10 95.86 96.47 
Rule-based LID system 
Dutch (NL) 95.85 97.22 96.53 
English (EN) 86.50 80.21 83.23 
Social Media Token (SMT) 97.56 98.00 97.77 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 427
For example ‘minute’ is misclassi?ed as English. However, if it is used as part of the 
phrase ‘last minute’, it should be considered Dutch. Only the phrase as a whole is 
considered to be Dutch, the individual tokens within the phrase are not. In order to 
capture these speci?c instances, multi-word token sequences would need to be included 
in the dictionaries, and currently, the classi?er operates on single tokens. 
6.2 Evaluation of the Synonym Selection Rule 
To evaluate the effect of the synonym detection step on the overall classi?cation 
process, a list of 400 words that are tagged as English in the base step of the LID 
system were extracted for further analysis. Each token was tagged as either borrowed or 
code-switching based on the information from the synonym dictionaries, and the 
original language (EN) was appended to Dutch whenever the system indicated that the 
word may be borrowed. This output was then compared to the gold standard, which 
was based on the presence or absence of a word in the “Woordenlijst Nederlandse 
Taal”. The analysis is based solely on the 400 individual tokens, without taking their 
context in the Tweet into account. In total, 82% of the tokens were correctly identi?ed 
as being either borrowed or code-switched. 260 tokens were correctly identi?ed as 
code-switching, compared to a total of 289 tokens that should have been classi?ed as 
code-switching and 71 out of 97 tokens have been correctly identi?ed as borrowing. 
Without this additional step, based on the initial rank information, all of these tokens 
would have been classi?ed as code-switched (EN), even though many of these are 
indeed part of the Dutch lexicon and should, therefore, be tagged accordingly. This 
demonstrates the importance of distinguishing between borrowing and CS in a lan-guage 
identi?cation system that classi?es closely related languages. 
As well as analyzing the impact of the synonym dictionary rule as a whole, the two 
different conditions in which a token is tagged as borrowing have also been examined 
(see Sect. 5.1 for a description of the conditions). Each condition considers the rank 
information of the token and the synonym that has been selected as an equivalent 
match. The ?rst enables the detection of borrowed tokens that have a higher rank than 
its equivalent Dutch synonym. A total of 53 borrowed words were correctly identi?ed 
using this method. Among the correctly identi?ed tokens are ‘we’, ‘must’, ‘budget’, 
‘crash’, ‘super’, ‘sale’ and ‘media’, ‘perfect’, ‘modern’, and ‘ranking’. The information 
that was used to classify the tokens is provided in Fig. 2 below. For each of the tokens, 
the English version was used more frequently than its Dutch equivalent. In some cases, 
the distance between the ranks of the two synonyms is much larger than others. 
Table 5. Confusion matrix of the decision tree classi?er 
NL EN SMT Total 
NL 2674 66 10 2750 
EN 67 611 3 681 
SMT 7 2 314 323 
Total 2748 679 327 13889 
428 S. Kent and D. Claeser
The larger the rank distance between the two synonyms, the larger the difference in 
frequency of use of the borrowed word compared to the Dutch equivalent synonym. 
In the second rule, a token is classi?ed as borrowed if the distance between the rank 
of a token and its selected synonym is less than 30,000. The CSB system correctly 
identi?ed 30 tokens using this rule. Among them is the selection of tokens provided in 
Fig. 2. In these instances, the frequency of use is higher in the Dutch synonym 
equivalent than in the token. For example, ‘ticket’ is used relatively frequently in 
Dutch, although the Dutch version ‘kaartje’ is still used more frequently. In other 
words, the original Dutch token is used more frequently than the borrowed equivalent 
of the word. Interestingly, this rule enables the identi?cation of borrowed nouns as well 
as highly frequent grammatical tokens. The synonym pair ‘me’ and ‘mij’ demonstrated 
the CSB system’s ability to recognize that ‘me’ is both a Dutch and English pronoun 
(Fig. 3). 
Whilst the ?rst borrowing rule may have identi?ed more borrowed tokens overall, a 
direct comparison of the number of correct tokens identi?ed by both of the rules shows 
that they are both equally capable at identifying borrowed tokens. 89.9% of the tokens 
classi?ed by the ?rst borrowing rule were correct and 90.9% of tokens classi?ed by the 
second rule were correct. 
The synonym selection process was crucial to successfully differentiating between 
borrowed and code-switched tokens. In order to judge whether the synonyms are a 
correct match or not, two Dutch native speakers separately annotated the synonym 
match lists. The judgment was based solely on whether the two tokens could be 
synonyms, without taking any context into account. These ?gures do not take into 
account whether or not the token was classi?ed correctly; it focuses solely on whether 
the synonym match is correct. Generally, there was agreement between the annotators, 
and the ?nal judgments for each annotator were merged to create an overall judgment 
list. In total, out of the 97 borrowed tokens identi?ed by the system, 79.4% of all 
synonyms have a correct match. Table 6 below shows that of the 77 correct synonym 
matches, only 5% of tokens were incorrectly classi?ed as borrowing. Contrastingly, 
40% of tokens with an incorrect synonym were incorrectly classi?ed as borrowing. 
Therefore, there seems to be a correlation between whether the synonym that is 
identi?ed by the system is correct or not and the corresponding classi?cation of bor-rowing 
or code-switching. If the synonym match is correct, the more likely the system 
will correctly identify whether a token is borrowed or code-switched. 
Overall, the system is relatively accurate at identifying whether an English token is 
in fact just English, or whether it also belongs to the Dutch lexicon. These tokens have 
been de?ned as borrowed tokens in the context of this study, even though strictly 
speaking not all tokens are actually borrowed from English and some may share 
another etymology. Nevertheless, the system is able to identify if a token should also be 
classi?ed as Dutch; so from the perspective of a method able to differentiate between 
these two languages, the classi?er will be a valuable tool in this process. 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 429
7 General Discussion 
Our initial assumption was that the Decision Tree classi?er would be the most suitable 
classi?er for features extracted from the rule-based LID system. However, even though 
it was the best performing classi?er, the rule related to the rank distance between a 
token and its corresponding synonym did not transfer. This is true for both versions of 
the vector. The rule was not learned whether or not the rank distance was explicitly 
provided. In each case, a different tree is generated, but both are equivocally complex: 
the classi?er bypasses the synonym rank rule and the model is based on grouping 
tokens with similar ranks to create paths. We suspect the reason that the classi?er did 
not learn the rule is that the algorithm that builds the decision tree has the objective to 
Fig. 2. A selection of correctly identi?ed borrowed (NL) tokens. The token is marked in bold 
and supplemented by its rank in the Dutch Wikipedia dictionary as well as the synonym selected 
by the classi?er and its matching rank. 
Fig. 3. A selection of correctly identi?ed borrowed (NL) tokens using the maximum rank 
distance rule. 
Table 6. Correlation between synonym matches and the number of correctly classi?ed 
borrowed (NL) tokens 
No. incorrectly classi?ed tokens No. correctly classi?ed tokens 
Correct matches 5% (n = 4) 95% (n = 73) 
Incorrect matches 40% (n = 8) 60% (n = 12) 
430 S. Kent and D. Claeser
?nd the most ef?cient local split. It aims to create the purest subset with maximum 
information gain, and consequently fails to detect the global optimum. Instead, the 
classi?er generated hundreds of speci?c paths to classify small groups of tokens. 
The second best performing classi?er, aside from the rule-based LID system, is the 
Support Vector Machine. In contrast to the Decision Tree, the SVM does, in fact, learn 
the synonym rank rule. We believe that this is because the RBF kernel enables the 
classi?er to generalize and learn the concept of a rank threshold for the synonyms. It 
does so by transforming the non-linear data from the dictionary rank lists to a 
hyperspace that allows for the separation of the otherwise intertwined examples of 
borrowing and CS in the rank lists. This assumption is supported by the observation 
that giving either the ranking distance as explicit information or just the synonym rank 
has no visible influence on either runtime or performance of the resulting SVM. Neither 
does changing the default value in the vector from 0 to -o10 million, a value larger than 
the size of the dictionary, for non-existing synonyms. 
Interestingly, the rule-based LID system performed very similarly to the machine 
learning classi?ers. [16] also reported a similar ?nding, in that the results for the rule-based 
system were actually slightly better than for the machine learning systems, 
suggesting that if the rules are designed carefully, language detection for this particular 
language pair can be just as accurate in rule-based systems as in machine learning 
systems. 
The performance of systems depends greatly on the quality of the external mate-rials. 
While designing the systems, we noticed both advantages and disadvantages for 
the different types of external resources. Firstly, the synonym dictionaries proved to be 
quite dif?cult to obtain. The decision was made to combine multiple synonym dic-tionaries 
in order to compensate for incomplete dictionaries. The main reason for doing 
so is the ability to cross-reference entries for the lemmas. This allows for a veri?cation 
of whether the entry is actually correct. For example, for some dictionary entries, the 
English translation of a word is listed as a synonym even though it is not of?cially a 
part of the Dutch lexicon. These tokens caused issues, as they were not included as an 
English token in the annotated gold standard, and were consequently incorrectly 
classi?ed. The most frequently occurring example is ‘why’, which is listed as a syn-onym 
for ‘waarom’ in the Open Taal synonym dictionary. This mistaken entry would 
be easy to rectify if all synonyms not present in at least one other dictionary are 
disregarded as synonym matches. However, this would not be possible with the current 
synonym dictionaries as many of the matches only occurred in one dictionary. Too 
many entries would be lost and the performance of the identi?cation of the borrowed or 
Dutch tokens would decrease. If such a frequently occurring word is listed as a syn-onym 
even though it is not, it is likely that this is also true for other entries, which may 
cause issues in the classi?cation of other tokens in the future. 
Secondly, the Wikipedia rank list turned out to be a highly suitable external 
resource. A comparison of the studies describing just a basic dictionary lookup 
approach to the results obtained in this system illustrates that the quality of the 
Wikipedia dictionaries enhanced the performance the ?rst step in the LID system. The 
system in [16] obtained an F1 score of 38% for identifying English tokens, [18] 
obtained similar Figs. (38% and 35%) for the English-Hindi and English-Bengali 
language pairs, and [19] obtained the highest F1 scores in comparison with 71% and 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 431
73% for Spanish-English and Nepali-English respectively. In the LID system we 
present, the most basic version without any additional rules achieved a micro F1 of 
72%. This suggests that the quality of the dictionaries is good, because based on just 
the lookup alone, the results are better than initially anticipated based on previous 
research. 
Having said this, a few issues still remain. Firstly, it must also be considered that 
while Wikipedia contains a large variety of topics and registers, there may be some 
topics that are overrepresented on Wikipedia and tokens related to that topic are 
consequently also more frequent in the dictionaries than they would be in other cir-cumstances. 
Secondly, the use of quotations or names in the articles may also mis-represent 
the actual frequency of certain tokens. Names of books or ?lms are not 
translated into Dutch and they are often used in the original language. Consequently, 
the article ‘the’ is extremely frequent in the Dutch Wikipedia pages even though it is 
not a Dutch token. In the English rank dictionary, ‘the’ is the most frequently used 
token and is ranked at one. In the Dutch dictionary, it is ranked at 63. Even if it is 
highly ranked, the assumption that words are more frequent in their language of origin 
still holds. Nevertheless, according to the Dutch Wikipedia rank dictionary, the word 
‘the’ is more frequent than most Dutch lexical items and it does not match the fre-quency 
information that one would expect of words that are not a part of the Dutch 
lexicon. 
8 Conclusion 
The question posed in this paper was whether or not a dictionary-based LID system is 
suitable for token-level language detection in a closely related language pair. Previous 
research [3] indicated that lexical items present in both languages, in this case, Dutch-
English, caused misclassi?cations in a dictionary-based lookup system. It was dif?cult 
to identify whether or not a token was code-switched because many English tokens 
were classi?ed as Dutch. The solution presented in this paper was to combine a system 
designed speci?cally to differentiate between borrowing and code-switching. The 
results show that by incorporating this method into token level language classi?cation 
yields a micro F1 of 94.66% and 95.69% for the rule-based LID system and the DTC 
respectively. This is a great improvement compared to the baseline (F1 = 85.29%) for 
Dutch-English CS detection in [3]. 
Even if the overall result is highly competitive to other similar systems, future 
research could bene?t from adding a number of improvements. Firstly, named entities 
were excluded from classi?cation altogether, because as far as we are aware, there are 
no suitable external named entity recognition systems for code-switched Dutch-English 
tweets. The systems could bene?t from the addition of named entity recognition, but 
more importantly, it should be included for the purpose of completing the classi?cation 
of a Tweet as a whole. Secondly, the synonym selection method could be improved, if 
context were to be taken into account. Currently, the context information is only used 
in the ?nal step of the LID system to correct any misclassi?cations by the frequency 
dictionary lookup and synonym dictionary lookup. It would be interesting to see 
432 S. Kent and D. Claeser
whether performance improves if this step is implemented within the synonym 
selection process, rather than as a ?nal step. 
One of the challenges for the design of the system was acquiring good external 
resources. The dictionaries based on Dutch and English Wikipedia are a highly suitable 
source for the creation of the language-speci?c word frequency lists. The inclusion of 
formal and informal language and a wide range of topics ensure many of the tokens are 
in fact present in the dictionaries. However, there seems to be a lack of freely available 
material for Dutch natural language processing. The synonym dictionaries, in partic-ular, 
are not ideal, as three separate dictionaries are necessary to achieve the results in 
this paper. The performance of the systems would improve with a better quality syn-onym 
dictionary. It is possible to improve the current dictionaries and tailor them 
speci?cally to the task at hand by verifying the synonym sets and adding other forms of 
the tokens already present. This would not only increase the likelihood of a synonym 
being present in the dictionary, but also the likelihood that the synonym is a correct 
match. Finally, both systems were developed using the language pair Dutch-English, 
and because the design of the classi?ers is quite simplistic and not necessarily tied 
based on a particular language, it would be interesting to see how they would perform 
on a different closely related language pair. 
References 
1. European Commission: Europeans and their languages. Special Eurobarometer 386 (2012) 
2. Poplack, S.: Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPANOL: toward 
a typology of code-switching. Linguistics 18, 581–618 (1980) 
3. Claeser, D., Felske, D., Kent, S.: Token-level code-switching detection using Wikipedia as a 
lexical resource. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. Language Technologies for 
the Challenges of the Digital Age. Lecture Notes in Arti?cial Intelligence, Lecture Notes in 
Computer Science, vol. 10713, pp. 192–198. Springer, Heidelberg (2018) 
4. Johnson, S.: A dictionary of the english language: a digital edition of the 1755 classic. In: 
Besalke, B. (ed.) The History of the English Language. https://johnsonsdictionaryonline. 
com/the-history-of-the-english-language/. Accessed 15 April 2014 
5. Muysken, P.: Code-switching and grammatical theory. In: Milroy, L., Muysken, P. (eds.) 
One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching, 
pp. 177–198. Cambridge University Press, Cambridge (1995) 
6. Auer, P.: Bilingual Conversation. Amsterdam/Philadelphia, Benjamins (1984) 
7. Poplack, S., Sankoff, D.: Borrowing: the synchrony of integration. Linguistics 22, 99–135 
(1984) 
8. Clyne, M.: Dynamics of Language Contact. Cambridge University Press, Cambridge (2003) 
9. Solorio, T., Blair, E., Maharjan, S., Bethard, S., Diab, M., Gohneim, M., Hawwari, A., Al-
Ghamdi, F., Hirschberg, J., Chang, A., Fung, P.: Overview for the ?rst shared task on 
language identi?cation in code-switched data. In: Proceedings of the First Workshop on 
Computational Approaches to Code Switching, pp. 62–72. Doha, Qatar (2014) 
10. Molina, G., AlGhamdi, F., Ghoneim, M., Hawwari, A., Rey-Villamizar, N., Diab, M., 
Solorio, T.: Overview for the second shared task on language identi?cation in code-switched 
data. In: Proceedings of the Second Workshop on Computational Approaches to Code 
Switching, pp. 40–49. Austin, Texas (2016) 
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 433
11. Shirvani, R., Piergallini, M., Gautam, G.S., Chouikha, M.: The Howard University system 
submission for the shared task in language identi?cation in Spanish-English Codeswitching. 
In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, 
pp. 116–120. Austin, Texas (2016) 
12. Samih, Y., Maharjan, S., Attia, M., Solorio. T.: Multilingual code-switching identi?cation 
via LSTM recurrent neural networks. In: Proceedings of the Second Workshop on 
Computational Approaches to Code Switching, pp. 50–59. Austin, Texas (2016) 
13. Bali, K., Sharma, J., Choudhury, M., Vyas, Y.: I am borrowing ya mixing?: An analysis of 
English-Hindi code mixing in Facebook. In: Proceedings of the First Workshop on 
Computational Approaches to Code Switching, Doha, Qatar, pp. 116–126 (2014) 
14. Patro, J., Samanta, B., Singh, S., Basu, A., Mukherjee, P., Choudhury, M., Mukherjee, A.: 
All that is English may be Hindi: enhancing language identi?cation through automatic 
ranking of the likeliness of word borrowing in social media. In: Proceedings of the 2017 
Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 
pp. 2264–2274, 7–11 September 2017 
15. Nguyen, D., Dogruöz A.: Word level language identi?cation in online multilingual 
communication. In: Proceedings of the 2013 Conference on Empirical Methods in Natural 
Language Processing, Seattle, Washington, pp. 857–862 (2013) 
16. Dongen, N.: Analysis and prediction of Dutch-English code-switching in social media 
messages. Unpublished master’s thesis. University of Amsterdam (2017) 
17. Postma, M., van Miltenburg, E., Segers, R., Schoen, A., Vossen, P.: Open Dutch WordNet. 
In: Proceedings of the Eight Global Wordnet Conference, Bucharest, Romania (2016) 
18. Das, A., Gambäck, B.: Code-mixing in social media text: the last language identi?cation 
frontier? Trait. Autom. Lang. 54(3), 41–64 (2013) 
19. Maharjan, S., Blair, E., Bethard, S., Solorio, T.: Developing language-tagged corpora for 
code-switching tweets. In: Proceedings of LAW IX - The 9th Linguistic Annotation 
Workshop, Denver, Colorado, pp. 72–84 (2015) 
434 S. Kent and D. Claeser
A Systematic Review of Time Series 
Based Spam Identi?cation Techniques 
Iqra Muhammad(?) , Usman Qamar, and Rabia Noureen 
National University of Sciences and Technology, H-12, Islamabad, Pakistan 
iqra1804@gmail.com, usmanq@ceme.nust.edu.pk, 
rabia.noureen15@ce.ceme.edu.pk 
Abstract. Reviews are an essential resource for marketing the company’s prod- 
ucts on e-commerce websites. Professional spammers are hired by companies to 
demote competitive products and increase their own product ratings. Researchers 
are now adopting unique methodologies to detect spam on e-commerce websites. 
Time-series based spam detection has gained popularity in the recent years. We 
need techniques that can help us catch spammers in real time, using fewer 
resources. Hence, an analysis involving the use of time series is of utmost impor- 
tance for real-time spam detection. We focus on systematically analyzing and 
grouping spam detection techniques that either involve the use of temporal 
features, or have used time series. This study will proceed with analyzing the 
techniques in terms of accuracy and results. In this research paper, a survey of 
di?erent time series based spam detection techniques has been presented and 
limitations of the techniques have been discussed. 
Keywords: Review spam · Time series · Techniques 
1 Introduction 
In the past decade, the increasing use of e-commerce websites for online shopping has 
also encouraged users to write reviews on products. This evolution of writing reviews 
on merchant websites has also led to spammers posting spam reviews. Companies sell 
products on e-commerce websites, hire spammers to post spam reviews for demotion 
of competitor’s products. Spam has lessened the credibility of online reviews and people 
become reluctant to buy a product, unsure whether the online reviews about a product 
are spam or not spam. Online spam reviews a?ect both buyers and sellers. Researchers 
have adopted a number of approaches for detection review spam. The conventional 
approaches for detecting review spam involve focusing on one reviewer or a single 
online review [1]. The authors in previous approaches [1], have detected duplicated 
reviews in a dataset as spam In addition to this, some previous methods of spam detection 
have focused on using n-gram features for spam identi?cation [2]. Our study will focus 
on providing a critical analysis, of the spam detection techniques that make use of time 
series to identify spam. 
Some spam detection techniques involve the use of psychological and behavioral 
features and identifying fake reviews [3, 4]. In addition to this, some state of the art 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 435–443, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_33
focuses on identifying temporal patterns for detection of spam [5]. Temporal patterns 
involve exploration of temporal burstiness patterns for detection of opinion spam [5]. 
Author in [6] introduces a robust spam identi?cation approach, in which content-based 
factors, rating deviation and activeness of reviewers are employed along the use of time 
series to identify spam in online reviews. The authors in [6] have listed the disadvantages 
and advantages of the proposed technique in terms of increasing time e?ciency and 
reducing high computation requirements. 
The authors in [7] have linked burstiness with reviewers. Bursts of reviews are 
de?ned as the abnormal peaks in a time series of reviews. Bursts can occur in a time 
series due to several reasons. The ?rst reason can be due to the sudden rise of a product’s 
sale on the merchant website. The second reason for the occurrence of a burst in a time 
series can be due to spam attacks. Many current state of the art techniques have captured 
these bursts in time series for identi?cation of spam attacks. A spam review and a 
spammer can be related in a burst. Spammers like to work in groups while posting 
reviews hence spam reviews are related in a burst. Non-fake reviews are also related to 
other non-fake reviews in a burst of time series of reviews. The authors in [8] have used 
a time series based fake review detection approach in which, they have combined content 
and usage information. This study [8] has covered product reviews and the behavioral 
qualities of reviewers. Lastly, the authors in [9] have highlighted the technique of using 
correlated temporal features for identifying spam attacks. Their methodology [9] of 
spam attacks revolves around the creation of a multidimensional time series derived 
from aggregation of statistics. The time series [9] has been constructed to show the 
e?ectiveness of using correlations. 
In the current study, a comparative analysis of existing time series based spam 
detection techniques has been performed. The focal point of our review paper is that 
after going through mentioned techniques, experts can devise an e?cient time series 
based spam detection approach that uses novel temporal features. The researchers can 
bene?t from the review of time series based spam detection techniques and identify the 
limitations of the existing techniques to propose new methods of temporal-based spam 
detection methods. The paper is ordered as follows: Sect. 2 describes the terms of spam 
detection and time series, Sect. 3 describes the critical analysis of some of the time series 
based spam detection techniques. The next Sect. 4 includes discussion on the techniques 
and Sect. 5 consists of the conclusion and future work. 
2 De?nitions 
2.1 Time Series 
Time series is defined as a series of data points arranged in a timely order. Time 
series is widely used in the banking sector to identify fraud in credit cards. It is also 
used as an application in anomaly detection [9]. The authors in [9] use multivariate 
time series as a tool for anomaly detection. Time series has also been recently used 
in the literature, for the detection of opinion spam [8]. Time series can be defined 
mathematically using the simple regression model: 
436 I. Muhammad et al.
y(t) = x(t)?? + ??(t), (1) 
where y(t) = {yt; t = 0, 1, 2,…} is a sequence, numbered by the time subscript t. t 
includes an observable signal sequence x(t) = {xt} and an unobservable white-noise 
sequence e(t) = {et} [16]. 
2.2 Review Spam Detection Techniques 
Review Spam is de?ned as the set of fake reviews posted on e-commerce websites. 
Opinion spam detection techniques [2, 3] have been widely used by researchers to detect 
fake spam. Such techniques assist e-commerce websites in automation of spam detec- 
tion. 
3 Systematic Review 
This section will give an overview of some papers found in the literature have used time 
series or temporal features for the detection of opinion spam. 
3.1 On the Temporal Dynamics of Opinion Spamming 
In [5], hybrid technique has been used to identify spamming on time series of Yelp 
reviews. The authors in [5] discovered temporal patterns in time series and their rela- 
tionship with the posting rates of spammers. They used auto vector regression methods 
to predict the fraud rate during multiple spamming policies. The authors in [5] also 
discovered the e?ects of ?ltered reviews on the rating on future rating of reviews. Author 
in [5] has covered three types of spamming policies. Due to the presence of three types 
of spamming policies, restaurants in yelp were grouped according to the policies. They 
calculated set of 10 modalities of normalized time series. For each behavioral modality, 
they had to use time series clustering in a certain policy. The authors in [5] also char- 
acterized the reasons of spamming by making a comparison of the time series of decep- 
tive rating with the truthful ratings. They had to use number of weeks as time interval 
for the time intervals in the time series. They also found out the major reasons of the 
deceptive ratings using correlation techniques. The authors also carried out 5-fold cross 
validation with classi?cation on time series features, behavioral features and n-gram 
features. This technique lacked the use of ten-fold cross validation when applying clas- 
si?cation on the review features. The authors could have also used additional set of 
textual features from the review text to improve the accuracy of the model. The compar- 
ison of di?erent spam detection techniques has been shown in Table 1. 
A Systematic Review of Time Series 437
Table 1. Shows precision, recall, f-score and accuracy for all techniques. 
Approaches Dataset Precision Recall F-score Accuracy 
On the temporal 
dynamics of opinion 
spamming [5] (late 
spamming) 
Yelp hotels and Restaurant 
Review dataset [14] 
86.3 95.3 90.6 90.1 
Exploiting Burstiness in 
Reviews for Review 
Spammer Detection [7] 
(Burst review with LBP 
and local observation) 
Amazon Review Dataset 
[13] 
83.7% 68.6% 75.4% 77.6% 
Fake Review Detection 
via Exploitation of Spam 
Indicators and Reviewer 
Behavior Characteristics 
[8] 
Amazon Review Dataset 
[13] 
75.2 75 7 74.9 x 
Detection of Fake 
Opinions using time 
series [6] 
Amazon Review Dataset 
[13] 
82 88 86 x 
Biomodal Distribution 
and Co-bursting in 
Review Spam Detection 
[10] 
Dianping’s real-life ?ltered 
(fake or spam) reviews [15] 
x x x x 
Modelling Review Spam 
Using Temporal Patterns 
and Co-Bursting 
Behaviors [12] 
Dianping’s real-life ?ltered 
(fake or spam) reviews [15] 
x x x x 
Review Spam Detection 
via Temporal Pattern 
Discovery [11] 
Review website 
(www.resellerratings.com) 
[11] 
x x x x 
3.2 Exploring Burstiness in Reviews for Review Spammer Detection 
A sudden rise in the popularity of products or the presence of spam attacks can produce 
bursts in time series. The authors in [7] have captured these bursts in time series of 
reviews. Spam reviews are related to other spam reviews in a burst. The reason is that, 
the spammers work in groups and post spam reviews collectively. Real reviews are 
related to other real reviews in time series. Author in [7] has proposed a robust spam 
detection framework that uses a network of reviewers appearing in the peaks of time 
series. They have also modeled reviewers and their co-occurrence in the peaks as Markov 
Random Field. In addition to this, they have used Loopy Belief Propagation technique 
to decide whether a reviewer can be marked as a spammer or not. They also used feature-engineering 
techniques, in the Loopy Belief Network for network inference. Lastly, they 
used an evaluation technique of using supervised classi?cation on their reviews. The 
limitations of this technique [7] include testing the proposed method on other review 
datasets to increase the validity of their technique. 
438 I. Muhammad et al.
3.3 Fake Review Detection via Exploitation of Spam Indicators and Reviewer 
Behavior Characteristics 
In [8] the authors have proposed a novel spam detection framework for the identi?cation 
of spam reviews. This technique combines content and usage information for the iden- 
ti?cation of spam product reviews. The model also includes reviewer’s behavioral char- 
acteristics and product reviews. The authors have derived a relationship between both 
reviews and spammers. Their proposed model [8], identi?ed bursts to examine suspi- 
cious time intervals of product reviews. The technique has also employed each review- 
er’s past record of reviewing to derive the authorship attribute. This authorship attribute 
of a reviewer is a strong indicator of spam in product reviews. The technique [8] has not 
only considered reviews in burst intervals but also considered reviews outside the burst 
intervals. The authors employed [8], basic spam indicators like the rating deviation, 
number of reviews and content similarity. The reviews captured from burst time intervals 
included spam indicators like content similarity and burst activity. The techniques last 
step involves a linear weighted scoring function, which integrates the individual scores 
and calculates a mean output for overall spam score. 
Lastly, the technique [8] has been validated on a real word review dataset. The limi- 
tations of this technique may include lack of e?ective features. The feature set used for 
identi?cation of spam reviews can be improved by using additional reviewer based 
features like reviewers location and taking into account reviewer’s writing style. They 
can also use a di?erent weighting scoring function for assigning scores, which might 
improve the accuracy of the model. 
3.4 Detection of Fake Opinions Using Time Series 
Author in [6] focuses on the implementation of a unique time series based spam detection 
algorithm. The algorithm involves factors like rating deviation, activeness of reviewer 
and other content based factors or detection of spam reviews. There are certain ?aws 
associated with conventional spam detection techniques. The proposed technique [6] 
has tried to overcome ?aws of high time consumption and high computations time. The 
technique is based on the assumption that the spammers work in groups and spam 
reviews frequency raises during certain time intervals. Author in [6] has tried to over- 
come the drawbacks of high time consumption and high computation required for 
searching for spam in large review datasets. The authors [6] have proposed that the 
system can be used as a real-time spam ?ltering system. We can easily clean large review 
datasets from spam reviews. Their proposed model achieved an F-score of 0.86. The 
limitation of this study is that they have not taken into account, the spam reviews that 
might exist outside the time series bursts. Secondly, the authors could have increased 
the accuracy of the model by employing features focused on the characteristics of a 
spammer like spammer’s IP address etc. Lastly, the validity of their proposed technique 
[6] can be increased by applying it onto multiple datasets. The technique is domain 
dependent because it has been created for application on review datasets. 
A Systematic Review of Time Series 439
3.5 Biomodal Distribution and Co-bursting in Review Spam Detection 
The author in [10] highlights the issue of spam detection and proposes a hybrid approach 
of using biomodal distribution and co-bursting factors. According to the authors, online 
reviews are critical for the comparison of di?erent products on merchant websites [10]. 
As explained earlier in the article spammers and fraudsters take advantage of online 
reviews and post fake opinions to attract customers on certain products. The previous 
approaches have made us of review contents, reviewer’s behavioral traits and rating 
patterns. This research [10] has focused on exploiting reviewer’s posting rates. The 
authors [10] discovered that the reviewers posting rates have a biomodal relationship 
with each other. According to [10], spammers post reviews in a collective manner within 
short intervals of time. This phenomenon of posting reviews collectively is called co-bursting. 
The authors in [10] have discovered patterns in a reviewer’s temporal 
dynamics. Authors in [10] include a labeled hidden Markov model with two modes. This 
model has been used to detect spamming using a single reviewer’s posting times. The 
method is then extended to couple hidden Markov model for identifying posting 
behavior and signals with co-bursting. They have also proposed a co-bursting network 
based model, which aids in detection of spammers. The proposed approach [10] lacks 
evaluation of the model through the use of supervised machine learning techniques. 
3.6 Review Spam Detection via Temporal Pattern Discovery 
This proposed approach [11] provides evidence of spam attacks being bursty. The bursts 
in a time series can be either positive or negative. The authors propose [11] a correlated 
temporal approach to detect spam. This approach uses singleton reviews spam identi?- 
cation. In addition to this, it maps SR spam identi?cation to correlated pattern detection. 
The proposed approach [11] is based on multidimensional time series anomaly detection 
algorithm. The algorithm involves making a multi-scale time series and use statistics 
with joint anomalies as an indicator of spam. The detected statistics involve factors like 
average rating, ratio of singleton reviews and lastly the average rating of reviews. The 
time-series, is then developed and an SR spam detection model is based on this time 
series. The algorithm also uses integration of longest common subsequence and curve 
?tting. Both of these factors are used to ?nd abnormal sections in each dimension of 
time series. 
The authors [11] have introduced a ranking technique to sum up all anomalies in 
various dimensions for detection of abnormal sections in time series. Fluctuations are 
common in time series. This algorithm has used a time window size of more than two 
months, so that noises in the time series can be smoothed. In a certain scenario, if a 
singleton review spam attack occurs in time series, the time window size is decreased 
so that any further abnormal patterns can become more obvious. The construction of 
time series is done, and this time series is multi-dimensional. Multi-dimensional time 
series is then used to identify abnormally correlated pattern detection problem. The 
results of this methodology show that it is quite e?ective in identi?cation of singleton 
review spam attacks. The limitation of this approach can be that this technique is not 
440 I. Muhammad et al.
applicable on other types of spams like sms and email spam. The model has been tested 
on a single dataset. 
3.7 Modelling Review Spam Using Temporal Patterns and Co-bursting 
Behaviors 
This technique [12] is based on a real life dataset from a review hosting site called 
dianping. The authors [12] discovered that reviewers posting rates were biomodal. In 
addition to this scenario, the transitions between di?erent states could be used to detect 
spammers from real reviewers. The technique proposed, involves a two model labeled 
hidden Markov model for identi?cation of spammers in review websites. The ?ndings 
of the model prove that the existing approach can outperform, supervised machine 
learning algorithms. Spammers are keener on writing reviews in a group and hence bursts 
in time series of reviews are created. The authors in [12] propose a co-bursting based 
approach for identifying spammers. This framework can enable more precise detection 
of spammers and outperforms the current state of the art mentioned in [12]. 
The authors have also mentioned that biomodal distributions are disparate and these 
distributions were identi?ed in both form as review spammers and non-spammers. The 
limitation of this approach is that it requires time stamps of reviews in a dataset. Without 
the presence of time stamps, the approach is not applicable in real life datasets. The 
advantage of the algorithm is that it can be applied to commercials review spam ?lters. 
4 Discussion 
We have compared all the approaches using the metrics of precision, F-score, recall and 
accuracy. The amount of precision, recall, f-score and accuracy for each technique has 
been taken from the articles mentioned in Table 1. A comparison has been made among 
the techniques keeping in view the fact that most of these approaches have been applied 
to the similar datasets. After the comparison, it can be seen that only some of the algo- 
rithms mentioned in Table 1, have used precision, recall, f-score and accuracy for 
comparison. The ?rst article referred in Table 1, uses the dataset of Yelp [14]. Yelp [14] 
is a website that provides reviews on hotels and restaurants. Spammers work in groups 
to post fake reviews about certain hotels. Spammers target hotels and Restaurants and 
fake reviews cause their ratings to decrease. This approach [5] has achieved an accuracy 
of 90.1 with late spamming. Late spamming achieved the best set of precision and accu- 
racy among all three types of spamming. The second approach [7] mentioned in 
Table 1, is based on exploration of burstiness in reviews for spammer identi?cation. 
This approach [7] produced these set of results in the table with the use of LBP and local 
observation techniques. This algorithm used Amazon review dataset [13]. Amazon 
review dataset [13] provides a large-scale dataset on various set of products. Products 
rating, reviews and other attributes have been included in Amazon review dataset. This 
approach achieves an accuracy of 70.1 with the LBP and local observations. The third 
approach [8], included in the table is based on spam detection by using reviewer char- 
acteristics and various spam indicators. This approach [8] also used Amazon review 
A Systematic Review of Time Series 441
dataset [13]. The approach [8] didn’t use the metric of accuracy for evaluating its model. 
It achieved an F-score of 74.9%. The fourth algorithm [6], mentioned in the table makes 
use of time series and other reviewer traits to detect spam in reviews. It has also Amazon 
review dataset [13]. The model achieved an F-score of 86%. This model [6] didn’t used 
any supervised machine learning technique to classify the suspicious set of reviews as 
spam or non-spam. 
The ?fth article [10] included in Table 1, is based on a biomodal distribution model 
used to detect review spam. This model used dianping’s [15] real life dataset. Dianping 
[15] is a Chinese website that includes reviews about consumer products and retail 
services. Dianping dataset is the single largest dataset to have spam and non-spam 
classes. Each review is for a single individual. There have been references in the liter- 
ature of yelp datasets [14], with class labels but these datasets are much small in size 
when compared to dianping dataset [15]. The authors in [10] have reasonably argued 
their choice of dataset because of its large size and presence of labels. The models 
proposed by this article [10], outperform existing models on this huge dataset [15]. This 
paper [10] didn’t use any metrics like accuracy, precision, recall and f-score for its spam 
detection model evaluation. The next technique [13] included in Table 1 has used 
temporal patterns and co-bursting factors to identify spam in review dataset. This article 
[12] has also used dianping’s real life dataset [15]. Temporal features were extracted 
from the dataset time stamps [12]. The authors in this article [12] haven’t used metrics 
like precision, recall, f-measure and accuracy for evaluation of proposed model. The 
last technique [11] mentioned in Table 1 has highlighted the importance of temporal 
features in reviews, for spam detection. Temporal patterns have been discovered in the 
reviews of a reseller website [11]. The dataset [11] contained around 408,469 reviews. 
Each review in the dataset [11] can be identi?ed by a unique id. The authors in [11] used 
the dataset for suspicious store detection via identi?cation of singleton spam attacks. 
Human evaluators in [11] were used to perform validation of the results by reading 
reviews from all 53 stores and singling out stores that were suspicious. This technique 
did not employ metrics like precision, recall, f-score and accuracy for evaluation of its 
model. In conclusion, all approaches mentioned in Table 1, used time series based on 
the assumption that spammers work in groups when posting spam reviews. Their collec- 
tive manner of working produces bursts in times series of reviews and we can easily 
capture these bursts for spam detection. 
5 Conclusion and Future Work 
This research paper highlighted state of the art methods that involved the use of time 
series for spam detection in online reviews. It made a critical comparative analysis of 
the techniques present in the literature. It also showed the details of the techniques of 
each related article in the literature related to time series based spam detection. Secondly, 
we also provided a summarized overview of all techniques, their used datasets and made 
a comparison of the metrics used for the evaluation of the proposed models. Our review 
paper can be used by experts as an asset while searching for state of the art relevant to 
time series based spam detection. Future work of this study includes proposing a hybrid 
442 I. Muhammad et al.
approach to time series based spam detection. The model can include more diverse 
feature engineering techniques and the use of supervised machine learning techniques 
for suspicious reviews ?ltered by time series. 
References 
1. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International 
Conference on Web Search and Web Data Mining - WSDM 2008 (2008) 
2. Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion 
spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational 
Linguistics (Volume 1: Long Papers) (2014) 
3. Dewang, R.K., Singh, P., Singh, A.K.: Finding of review spam through “Corleone, review 
genre, writing style and review text detail features”. In: Proceedings of the Second 
International Conference on Information and Communication Technology for Competitive 
Strategies - ICTCS 2016 (2016) 
4. Mukherjee, A., Kumar, A., Lin, B., Wang, J., Hsu, M., Castellanos, M.: Spotting opinion 
spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD 
International Conference on Knowledge Discovery and Data Mining, pp. 632–640 (2013) 
5. Kc, S., Mukherjee, A.: On the temporal dynamics of opinion spamming. In: Proceedings of 
the 25th International Conference on World Wide Web - WWW 2016 (2016) 
6. Heydari, A., Tavakoli, M., Salim, N.: Detection of fake opinions using time series. Expert 
Syst. Appl. 58, 83–92 (2016) 
7. Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting burstiness 
in reviews for review spammer detection. In: Kiciman, E., et al. (eds.) ICWSM. The AAAI 
Press (2013) 
8. Dematis, I., Karapistoli, E., Vakali, A.: Fake review detection via exploitation of spam 
indicators and reviewer behavior characteristics. In: SOFSEM 2018: Theory and Practice of 
Computer Science Lecture Notes in Computer Science, pp. 581–595 (2017) 
9. Li, J., Pedrycz, W., Jamal, I.: Multivariate time series anomaly detection: a framework of 
hidden Markov models. Appl. Soft Comput. 60, 229–240 (2017) 
10. Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A., Shao, J.: Bimodal distribution 
and co-bursting in review spam detection. In: Proceedings of the 26th International 
Conference on World Wide Web - WWW 2017 (2017) 
11. Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery. 
In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining - KDD 2012 (2012) 
12. Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A.: Modeling review spam using 
temporal patterns and co-bursting behaviors. arXiv preprint arXiv:1611.06625 (2016) 
13. Amazon: Amazon (2018). http://snap.stanford.edu/data/amazon/productGraph/. Accessed 4 
Feb 2018 
14. Yelp: Yelp (2017). http://www.yelp.com. Accessed 6 Dec 2017 
15. Dianping Chinese Review dataset. http://liu.cs.uic.edu/download/dianping/. Accessed 6 Apr 
2018 
16. Hamilton, J.D.: Time Series Analysis, vol. 2. Princeton University Press, Princeton (1994) 
A Systematic Review of Time Series 443
CNN with Limit Order Book Data 
for Stock Price Prediction 
Jaime Nino ˜ 
1(B) , German Hernandez1 , Andr´es Ar´evalo1 , Diego Leon2 , 
and Javier Sandoval2 
1 
Universidad Nacional de Colombia, Bogot´a, Colombia 
{jhninop,gjhernandezp,ararevalom}@unal.edu.co 
2 
Universidad Externado de Colombia, Bogot´a, Colombia 
{diego.leon,javier.sandoval}@uexternado.edu.co 
Abstract. This work presents a remarkable and innovative short-term 
forecasting method for Financial Time Series (FTS). Most of the 
approaches for FTS modeling work directly with prices, given the fact 
that transaction data is more reachable and more widely available. For 
this particular work, we will be using the Limit Order Book (LOB) data, 
which registers all trade intentions from market participants. As a result, 
there is more enriched data to make better predictions. We will be using 
Deep Convolutional Neural Networks (CNN), which are good at pat-tern 
recognition on images. In order to accomplish the proposed task 
we will make an image-like representation of LOB and transaction data, 
which will feed up into the CNN, therefore it can recognize hidden pat-terns 
to classify FTS in short-term periods. We will present step by step 
methodology to encode ?nancial time series into an image-like represen-tation. 
Results present an impressive performance, ranging between 63% 
and 66% in Directional Accuracy (DA), having advantages in reducing 
model parameters as well as to make inputs time invariant. 
Keywords: Short-term forecasting 
·
Deep Learning 
Convolutional Neural Networks 
·
Limit Order Book 
Pattern recognition 
1 Introduction 
Finance has become a highly sophisticated scienti?c discipline that depends on 
innovations from computer science to analyze huge ?ows of data in real time. 
Finance o?ers nonlinear relationships and large data sets on which Machine 
Learning (ML) ?ourishes, but they also impose tremendous challenges when 
applying these computational techniques, due to data noisiness, non linearities 
among other characteristics of ?nancial systems. Literature is vast when report-ing 
applications using machine learning methods for FTS modeling [4,6,9,11,15]. 
Works include Arti?cial Neural Networks, Support Vector Machines, among 
others. Lately, Deep Learning has emerged as a superior ML technique for a 
.o
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 444–457, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_34
CNN with Limit Order Book Data for Stock Price Prediction 445 
wide variety of ?elds, including Image Recognition, Audio Classi?cation, Natu-ral 
Language Processing, as well as FTS Forecasting and Algorithmic Trading 
among others. In this work, we use a Convolutional Neural Network to predict 
movements of FTS. We will be working with both LOB and transaction (tick) 
data. LOB data contains all traders intentions to negotiate an asset at a par-ticular 
price and quantity at certain time t. LOB information is richer than 
transaction data, which only records prices and quantities exchanged at certain 
time t. In order to use CNN, we represent both LOB and tick data as images. 
Results are very competitive when compare to other DL approaches reported in 
[1,3,7,16,20], with the advantage of using the same trained model for di?erent 
assets. 
This paper continues as follows: Sect. 2 explains how LOB and tick data is 
transformed into images, Sect. 3 gives a brief summary of CNN, Sect. 4 explains 
the methodology to process and classify image data, Sect. 5 shows results and 
Sect. 6 gives ?nal remarks, conclusions, and further work opportunities. 
2 Limit Order Book and Tick Data Transformation 
2.1 De?nitions 
Limit Order Book. Order Book Data records market agents buy/sell inten-tions. 
It includes a time-stamp, quantity and price to buy/sell. This data is 
known as Limit Order Book (LOB). Formally, an order x = (p, q, t, s) sent at 
time tx with price px, quantity qx (number of shares) and side sx (buy / sell), 
is a commitment to buy/sell up to qx units of an asset at price px. Orders are 
sorted by arrival time t and quoted price p. Sell orders have larger prices than 
buy orders. [5,8,18] Some other useful concepts include [8,18]: 
– Spread size is the di?erence between the best sell and buy price. 
– Bid price is the highest price among all active buy orders at time t. Conversely, 
Ask price is the lowest price among all active sell orders at time t. Both are 
called best quotes. 
– An LOB L(t) is the set of all active orders at time t. 
Dynamics of LOB are complex [5,8], since it re?ects interactions among mar-ket 
agents with a di?erent point of views and di?erent trading strategies. For a 
particular time t, LOB concepts are illustrated in Fig. 1. 
When all recorded intentions are joined, they can be seen as an Image Fig. 2. 
On this image representation, y-axis represent prices, the x-axis is time and each 
point is a quantity willing to be traded. The darker the color the most quantity 
q at certain price p. In [19], authors used this graphic representation to cluster 
LOB-Patterns in order to build a classi?er. Based on this work, LOB data can 
be seen as a list of tuples (prices-quantities) where agents expect to negotiate. 
Numerically, this representation can be seen as a multivariate FTS1 . 
1 
Some considerations should be done, particularly related to the dimensionality of 
the FTS.
446 J. Nino ˜ et al. 
Fig. 1. LOB snapshot, taken from [8]. 
LOB Representation. For a set of successive timestamps, LOB data can be 
represented as a matrix-like object, where column labels are timestamps, row 
labels are prices and the content of each cell is the number of shares to bid/ask. 
Each cell contains a quantity q, with subindex side s, time t and price line p. 
Order side could be either ask a or bid b. Because there are order imbalances, 
price lines subindex are k for the ask side and j for the bid side (Table 1). 
Table 1. LOB matrix representation 
t0 t1 ... tn 
AskP ricek qa0k ... ... qank 
AskP ricek-.1 
qa0k-.1 
... ... qank-.1 
... ... ... ... ... 
AskP rice0 qa00 ... ... qan0 
BidP rice0 qb00 ... ... qbn0 
... ... ... ... ... 
BidP ricej-.1 
qb0j-.1 
... ... qbnj-.1 
BidP ricej qb0j ... ... qbnj 
Normalizing each qsti between 0–255, will produce a LOB gray scale image. 
However, there is a lot more information in LOB data. Because each order is 
recorded individually and sorted by arrival time, it is possible to aggregate vol-umes 
at the same price. By doing so you can get how many di?erent orders 
(quotes) are placed at the same price. Formally, for each unique price p adds all 
quantities qk, where q = [q1,q2, ...qm], being m the last entered order at price p. 
This information is very important because is di?erent to have many distinct 
agents interested at one particular price that just a few ones. However this 
fact, under real market conditions, goes hand in hand with how much volume 
(quantity) of the asset is available at that particular price p. In other words, 
it is important to have some sense of the distribution. It is di?erent to have a
CNN with Limit Order Book Data for Stock Price Prediction 447 
lot volume concentrated in just one participant that distributed across many. 
To introduce this information in our representation, we used maxpk(q), for each 
unique price p at line k, signaling a sense of the volume distribution. 
As a result, we will represent LOB data in a 4-channel representation, which 
can be seen as a RGBA image (Fig. 2), where: 
– R channel is only used for ask volumes qa, 0 otherwise. 
– G channel is only used for bid volumes qb, 0 otherwise. 
– B channel is only used to represent total number of placed orders at a unique 
price p. 
– A channel is only used to represent volume distribution for a unique price p, 
taking maxpk(q). 
Fig. 2. LOB as image, taken from [19]. 
Tick Data. Tick data records transactions, that is prices and quantities 
exchanged for a particular asset. Formally, a transaction occurs when at time 
t the bid price equals the ask price. At this point, a transaction T = (p, q, t) 
occurs, where pT is the price, qT is the shares quantity exchanged and tT is the 
transaction time-stamp [18]. Tick data is a univariate time series2 . 
Tick Data Graphical Representation. As mentioned before, tick data is the 
most widely used data when modeling FTS. This is because is easier to obtain. 
LOB data is more di?cult to get and usually cost a lot, not just in money terms 
but also in storage terms. Transactions are heavily in?uenced by the intentions 
2 
Bivariate if volumes are included.
448 J. Nino ˜ et al. 
recorded in the LOB, but they do not have the richness of LOB. Nevertheless, 
we expect, that in conjunction with the LOB, to yield better results. In other 
to homogenize inputs, it is necessary to transform tick data into a matrix-like 
representation. In [22], authors show a step by step methodology that transforms 
univariate time series into an image representation. This transformation is called 
Grammian Angular Field (GAF), which consists of the following steps3 : 
– Time series normalization between [-1, 1] 
– Time series is converted from Cartesian to Polar coordinates 
fo = arccos(xt); r = 
ti 
N 
,ti 
?i 
N (1) 
– GAF matrix deduction, de?ned as: 
?s 
?s 
?s 
?s 
?s 
< x1, x1 > ... < x1, xn > 
< x2, x1 > ... < x2, xn > 
. 
. 
. 
. . . 
. 
. 
. 
< xn, x1 > ... < xn, xn > 
?n 
?n 
?n 
?n 
?n 
, 
where < x, y > = x 
· 
y 
-, 
v, 
I 
-, 
x2 
· 
2 
I 
-2 
y2 
Authors in [22] used for non-Financial Time Series. In this paper, we apply 
the same general steps in order to obtain a graphical version of the tick data, as 
illustrated in Table 2. 
One advantage of this transformation is that marks peaks of the input signal, 
based on intensity levels Table 2. This is useful for pattern recognition because 
it helps to di?erentiate price variances within the original signal. On the other 
hand, the transformed input can be rolled back to the original signal [22]. We 
expect that on this new space, patterns could be easier to identify since CNN’s 
learning capabilities have been proven good in frequency spaces. In fact, in a 
previous work we show how a wavelet transformation improve results over a 
pure time-space approach [1]
4 
. 
3 Deep Learning - Convolutional Neural Networks 
The concept of Deep Learning (DL) was adopted from Neuroscience [13], where 
the seminal authors [17] proposed a novel way of how our visual cortex processed 
data coming in through our visual system using a layered representation, starting 
in the retina all the way up to the visual cortex. Their proposal consisted of 
making sparse representations of input data, in order to get its appropriated 
representation. In other words, any instance of data can be reconstructed as a 
di?erent linear combination of the same components from sparse representations 
from the original data or to make more complex representations of the data at 
each layer by combining the representation of the previous layer [13]. 
3 
For full details please refer to [22]. 
4 
We used other DL topologies.
CNN with Limit Order Book Data for Stock Price Prediction 449 
Table 2. Original tick data vs Image representation of tick data 
Tick-data line 
chart 
Image 
representation 
This development was computational feasible only until 2006, when semi-nal 
authors [10], proposed a novel Unsupervised Learning algorithm to train 
deep architectures consisted of Restricted Boltzmann Machines (RBM). This 
model was capable of building complex representations of data at deeper layers 
by capturing sparse representations from the previous ones. At that time, this 
algorithm won an Image Classi?cation contest and it was established as the DL 
introduction [13]. 
Since its emergence, DL has facilitated the application and use of di?erent 
neural network topologies more successfully in di?erent ?elds, due to the fact that 
DL tackles the issue of gradient vanishing while training multilayer networks. As 
a result, di?erent network topologies are being used with DL, including tradi-tional 
Multilayer Perceptron (MLP), Recurrent Neural Networks (RNN), Long 
Short-Term Memory (LSTM) Networks, Deep Belief Networks (DBN) and Con-volutional 
Neural Networks (CNN). Each topology has its own particularities. 
In the case of CNN, they have been used for Image Processing and Classi?cation 
task. A CNN is a variation of a Multilayer Perceptron, which means that it is a 
feed-forward network, however, it requires less processing when compared to a 
MLP, due to the mechanism used to process input data. Moreover, CNN’s main 
characteristic is to be space invariant, that is due to the convolution operator 
that transform data inputs. 
CNN are biological inspired, trying to emulate what happens in mammal’s 
Visual Cortex, where neural cells are specialized to distinguish particular fea-
450 J. Nino ˜ et al. 
tures. Building blocks of a CNN architecture are in charge of doing this feature 
detection by activating or de-activating a set of neurons. Since market agents 
decisions are mostly made from visual analysis of price changes and events in 
the LOB, we expect that an algorithm can learn patterns in order to help trig-ger 
trading decisions. In fact, [18,19] shown that a visual dictionary could be 
constructed from LOB data and that dictionary had predicting capabilities. 
The two main build blocks of a CNN are the convolution layer and the pooling 
layer, which in conjunction with a dense layer, complete a CNN. 
Convolution Layer. It is in charge of applying convolution operator to the 
input matrix, in other words it applies a kernel to ?lter data input. Depending on 
the parameters used, it can reduce or maintain input’s dimensionality. The reason 
to convolve is to identify edges. That means to identify or separate features that 
later on can be used to construct more complex representations in deeper layers. 
Pooling Layer. It is a local operator, that takes convolution output and maps 
subregions into a single number. The pooling operator can extract the max value 
of the mapped subregion (Max pooling) or the average value of the mapped 
subregion (Average Pooling). In other words, it gets subsamples out of the Con-volution 
Layer. 
Usually both layers make are treated as one layer in the CNN topology, 
however, it is not necessary to have one convolution and one pooling layer. 
Additionally, CNN topologies usually include various layers of convolution plus 
pooling, therefore networks extract simpler features at the ?rst layer, and by 
combining those, can learn more complex features in deeper layers. 
Dense Layer. Finally, the deeper convolutional layer is connected to a dense 
layer (fully connected), from which network obtains its outputs. As mentioned 
before, the CNN topology may have one or more dense layers. 
AlexNet and LeNet: Well-Know CNN Architectures. LeNet-5 is a CNN 
created by [14] and it was aimed to make hand-written number recognition. It 
consists of 7 layers (Input, Conv + Pool, Conv + Pool, Dense + Output). At that 
time, computing resources were scarce, creating a constraint for this technique. 
However, as computer resources got better in performance and cost, training this 
particular architecture is easy and it has become a baseline in image recognition 
contests. 
AlexNet was created in 2012 and it became famous due to the fact that 
reduces the classi?cation error in an Image Recognition Contest to 15.3% by 
that time. Nowadays, classi?cation error is much lower. Since AlexNet was the 
pioneer, it has become baseline architecture as LeNet. AlexNet took advantages 
of computer developments, particularly parallel processing through Graphics 
Processing Units (GPUs). It was created by [12]. It has more ?lters than LeNet as 
well as stocked convolution layers, as a result, it is deeper with more parameters.
CNN with Limit Order Book Data for Stock Price Prediction 451 
We decide to compare di?erent CNN topologies, in order to compare DA 
among them in order to analyze advantages and disadvantages of each one. We 
will make the comparison with another self-created CNN topology. 
Next section will give step by step explanation for our experiment. 
4 Classifying Financial Time Series with CNN 
4.1 Why a CNN for FTS Classi?cation 
– Firstly, DL models have demonstrated a greater e?ectiveness in both clas-si?cation 
and prediction tasks, in di?erent domains such as video analysis, 
audio recognition, text analysis and image processing. Its superiority is due 
to the fact that they are able to learn useful representations from raw data, 
avoiding the local minimum issue of ANNs, by learning in a layered way using 
a combination of supervised and unsupervised learning to adjust weights W . 
– Secondly, DL applications in computational ?nance are limited [2,3,7,21,23] 
and as long as it goes to our knowledge, there is no publication applying CNN 
to FTS, particularly using LOB data for short-term periods forecasting. 
– Thirdly, CNN are good for pattern recognition, real traders have told us that 
they try to identify patterns by following buy/sell intentions in a numeric 
form. In a previous work, [18] identi?ed volume barriers patterns to translate 
them into trading decisions and [19] identi?ed visual patterns and cluster 
them into a bag of words model to predict market movements. As a result of 
these works, we decided to extend them and use a more suitable technique 
for pattern recognition such as CNN on image-like representation of market 
data. 
– Finally, by applying input space change (from time to a frequency), we 
expect that CNN will recognize patterns more e?ective, indeed authors in 
[1] improved their results by using wavelets to represent high frequency data 
of several ?nancial assets. Even tough our images are not natural ones, we 
expect that CNN’s layers are capable to distinguish simple frequency changes 
(edges) at lower layers in order to identify more complex patterns at deeper 
ones. 
4.2 Experimental Setup 
– Data acquisition: Original dataset is compose of LOB and transaction data 
for 12 stocks listed on the Colombian Stock Market, from Feb 16, 2016 to Dec 
28, 2017. Dataset includes 184,450 LOB ?les and 612,559 ticks (transactions), 
totaling 590MB in disk.
5 
– Data preparation: For each stock, data normalization was conducted, taking 
into account some considerations which include handling of no orders at some 
price levels in LOB data, some liquidity constrains and event of LOB. Details 
are given in the next subsection. 
5 
Data provided by DataDrivenMarket Corporation.
452 J. Nino ˜ et al. 
– Data transformation: For each stock both LOB data and tick data are trans-formed 
to an image-like representation, following the methodology previously 
explained. 
– CNN modeling: We chose a base CNN architecture. We trained and test it 
with transformed data. 
– Model Comparison across di?erent CNN architectures: We use another two 
CNN, which mimic Le-Net and Alex-Net standard architectures, in order to 
compare the proposed model. 
– CNN comparison again other DL topologies: We compare results achieved 
results obtained in this work against others, which have been used for simi-lar 
problems (Short-term forecasting) but di?erent Deep Learning topologies 
(RNN, LSTM, Multilayer Perceptron, DBN) 
Following paragraphs will provide further details of our experimental setup. 
4.3 Data Preparation 
Data Normalization. For each stock, prices, volumes (quantities) and a num-ber 
of orders at the same price were normalized between (0–1]. Given the fact 
that LOB data may have price levels with no demand/o?er, minimum values 
were reduced by a small factor so that minimum values had a small value above 
zero. That is because empty cells in the LOB had a 0 value, therefore we can 
di?erentiate a no entry in LOB with an entry with a very low volume or just 
one order at certain price p. 
Data normalization by stock facilitates magnitude equilibrium across all stock 
data, regardless their nominal prices or volumes. In other words, we homogenize 
the image representation in di?erent dimensions: price, quantities, and a number 
of orders. 
Handling of Liquidity Constraints. Given the fact that Colombian market 
is not highly liquid, we only took, for each stock, LOB data that had enough 
entries in a single trading day. That is, we took trading days which had more 
than 100 ?les on LOB data per stock, which is equivalent to have at least one 
LOB event for any given stock every three and half minutes on average. For 
classi?cation purposes, it would not mind having low liquid days mixed with 
high liquid days. However, for practical purposes, liquidity constraints are very 
important in ?nancial markets, because spreads may vary widely as liquidity is 
lower. That is the reason we choose samples corresponding to highly liquid days. 
Handling of LOB Events. We took an event-based approach, that is to 
analyze a ?xed number of LOB events (10 in this case). This means that the 
LOB matrix explained in Sect. 2 Table 1, was partitioned into ?xed segments of 
10. And we took all of the ticks that happened between this 10 LOB records, 
to create the corresponding image for tick data. Figure 3 illustrates procedure 
described above.
CNN with Limit Order Book Data for Stock Price Prediction 453 
Fig. 3. LOB events. 
Handling of LOB Deepness. LOB data may have many di?erent lines or 
prices in both side (bid/ask). Depending of market conditions depth wide varies, 
that is not all time you will have a symmetric number of lines for each book side. 
We have decide to work with LOB data of 10 lines depth, that is the ?rst 10 
di?erent prices for each side. Prices start from the best quotes (down/up) side 
depending (bid/ask). 
Additional Considerations. It is important to note the following: 
– Price dynamics make that a price matrix with more than 20 rows (prices) 
in Fig. 1. In other words, we will have unequal height for each LOB 10-event 
image. Table 3 shows results graphically. 
– Prices with no volume will have a 0 value. This value will be always di?erent 
for the lowest volume after normalization, as mentioned before. 
– To make LOB image’s size homogeneous for modeling purposes, we resize each 
image to be 10 width and 40 height. Individual price matrices have di?erent 
height. This happens because of price dynamics, in order words there are 
di?erent set of prices for each time t, depending of traders intentions. 
4.4 CNN Modeling 
Data Input. Four channel images are used, one for LOB data another for tick 
data. A ?ve dimensional tensor is used for data input, with size [n, 2, 10, 40, 4]. 
The ?rst dimension is the number of samples, second one the number of images 
categories (LOB/tick) an the other three, image dimensions (Width, Height, 
Channels).
454 J. Nino ˜ et al. 
Table 3. LOB data images 
Data Labeling. Data will be classi?ed in three di?erent classes: 
– Class 0: Upwards movement 
– Class 1: Downwards movement 
– Class 2: No trending movement 
Class speci?cation was based in how a following set of ticks behave after a 
10-set LOB events. Ticks analysis was done to get the thresholds. Table 4 illus-trates 
thresholds. 
Table 4. Three class Rules 
Price direction Rule Class 
Upward movement Last tick price above 0.03% vs last tick of the previous window 0 
Downward movement Last tick price below –0.03% vs last tick of the previous window 1 
Flat movements Otherwise 2 
CNN Architecture. We use a standard CNN architecture, which consists of 
(Input + Conv + Pool + Conv + Pool + Dense + Dropout), input images’ size 
is 10 × 40. We compare it to AlexNet and LeNet. We have to make some modi?- 
cations to input images’ size (20 × 40) as well as ?lters sizes in some convolution 
layers, particularly for AlexNet type con?guration. 
We set up two di?erent experiments, one using LOB data only, another using 
both LOB and tick data, meaning that the second one had more input informa-tion. 
We used TensorFlow. 
Special considerations for training size included dropout at 40% and bath size 
at 100. The dataset was split into 90% for training, 10% for testing. The number 
of samples was 67,348 images. Moreover, a similar setup was built taking into 
account only LOB data. This means a whole set of experiments working with 
2D Convolutions.
CNN with Limit Order Book Data for Stock Price Prediction 455 
5 Results 
5.1 Model Comparison Across Di?erent CNN Architectures 
The CNNs were used to classify the three target classes (Up, Down, Flat). Table 5 
shows the performance of the three di?erent architectures over the testing sam-ples. 
De?nitely, the combination of LOB and Tick data as model’s features signif-icantly 
increased the model accuracy model; it achieved accuracies greater than 
65%. LeNet* and AlexNet* had a better performance than the proposed topol-ogy, 
but they require too much computational power for training purposes, then 
it could become a serious problem in a real high-frequency trading strategy. On 
the other hand, the proposed CNN Topology sacri?ces some performance (less 
than 1%), but it is simpler and easier to train. This property is useful in a real 
environment, given that it allows to retrain the model and deploy it. 
Table 5. Result summary for di?erent architectures 
Experiment Topology Data input Perfomance 
2D-LeNet LeNet* LOB 59.56% 
2D-AlexNet AlexNet* LOB 63.15% 
2D-Own Other CNN Topology LOB 58.23% 
3D-LeNet LeNet* LOB+Tick 66.09% 
3D-AlexNet AlexNet* LOB+Tick 66.83% 
3D-Own Other CNN Topology LOB+Tick 65.31% 
5.2 Model Comparison Against Other DL Topologies 
As observed in Table 6, proposed model is very competitive with the advantage 
that one model runs for several assets (Table 6). 
Table 6. Comparison against other DL topologies 
DL Topology Classes Data used Directional accuracy 
Multilayer Perceptron [1] 2 1-Stock, tick data 66% 
Deep Belief Network [16] 2 1-Stock, LOB + Tick data 57% 
Proposed Model (CNN) 3 12-stocks, LOB + Tick data 65.31% 
6 Conclusion and Future Research 
CNN for FTS prediction purpose worked well. DA shows that results are very 
competitive, in fact, better than other approaches tested before [1,16,19]. As 
expected, performance improves when both LOB and tick data is used in
456 J. Nino ˜ et al. 
conjunction, and the main reason is simple: there is more market information. 
Image-like representation is useful and even could be extended, that is it is 
possible to have more channels in the original input image (matrix). Perceived 
advantages 
– One network for multiple assets. It is not usually the case, given the fact that 
each asset has it owns dynamics. Image-like representation homogenize inputs, 
resulting in an image representing market information, ?nding patterns across 
all image set, regardless the asset. 
– Lifetime of trained model. In ?nancial applications frequent retraining is the 
norm. This approach extends the lifetime of the trained model due to the 
time invariance fact associated with images. 
Perceived disadvantages 
– It is a data intensive technique. As there are more images for training, results 
will improve. 
– Training times are large, particular for complex architectures such as AlexNet, 
which uses several channels and several layers. 
– Preprocessing could be tricky. There are a lot of details to take into account 
when transforming raw data. 
In our experience, we suggest a trade-o? analysis between training times and 
lifetime of the trained model. For real implementations with an expected lifetime 
ranging from 5 min to a couple of hours, we think is hugely advantageous. This 
model should be tested with data from more liquid markets, to check preprocess-ing 
times as well as performance. We think that there are a lot of possibilities for 
improvement, including the use of combined approaches (LSTM and CNN), and 
to code more information in more channels, for example, technical information. 
References 
1. Ar´evalo, A., Nino, J., Hern´andez, G., Sandoval, J.: High-Frequency Trading Strat-egy 
Based on Deep Neural Networks, pp. 424–436 (2016). https://doi.org/10.1007/ 
978-3-319-42297-8 40 
2. Arnold, L., Rebecchi, S., Chevallier, S., Paugam-Moisy, H.: An introduction to 
deep learning. In: ESANN (2011). https://www.elen.ucl.ac.be/Proceedings/esann/ 
esannpdf/es2011-4.pdf 
3. Chao, J., Shen, F., Zhao, J.: Forecasting exchange rate with deep belief networks. 
In: The 2011 International Joint Conference on Neural Networks, pp. 1259–1266. 
IEEE (2011). http://ieeexplore.ieee.org/articleDetails.jsp?arnumber=6033368, 
http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=6033368 
4. Chen, M., Ebert, D., Hagen, H., Laramee, R.S., van Liere, R., Ma, K.L., Ribarsky, 
W., Scheuermann, G., Silver, D.: Data, information, and knowledge in visualiza-tion. 
IEEE Comput. Graph. Appl. 29(1), 12–19 (2009) 
5. Cont, R., Stoikov, S., Talreja, R.: A stochastic model for order book dynamics. 
Oper. Res. 58, 549–563 (2010)
CNN with Limit Order Book Data for Stock Price Prediction 457 
6. De Goijer, J., Hyndman, R.: 25 years of time series forecasting. J. Forecast. 22, 
443–473 (2006) 
7. Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event-driven stock pre-diction. 
In: Proceedings of the Twenty-Fourth International Joint Conference on 
Arti?cial Intelligence (ICJAI) (2015). http://ijcai.org/papers15/Papers/IJCAI15- 
329.pdf 
8. Gould, M.E.A.: Limit order books. Quant. Financ. 13, 42 (2010) 
9. Hamid, S., Habib, A.: Financial forecasting with neura networks. Acad. Acc. 
Financ. Stud. J. 18, 37–56 (2014) 
10. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief 
nets. Neural Comput. 18(7), 1527–1554 (2006). https://doi.org/10.1162/neco. 
2006.18.7.1527, pMID: 16764513 
11. Huang, G.E.A.: Trends in extreme learning machines: a review. Neural Netw. 61, 
32–48 (2015) 
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi?cation with deep con-volutional 
neural networks. In: Proceedings of the 25th International Conference on 
Neural Information Processing Systems, NIPS 2012, vol. 1, pp. 1097–1105. Curran 
Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257 
13. Laserson, J.: From neural networks to deep learning: zeroing in on the human 
brain. XRDS 18(1), 29–34 (2011). https://doi.org/10.1145/2000775.2000787 
14. Lecun, Y., Bottou, L., Bengio, Y., Ha?ner, P.: Gradient-based learning applied to 
document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 
15. L¨angkvist, M., Karlsson, L., Lout?, A.: A review of unsupervised feature learn-ing 
and deep learning for time-series modeling. Pattern Recognit. Lett. 42, 11–24 
(2014). http://www.sciencedirect.com/science/article/pii/S0167865514000221 
16. Nino, J., Hernandez, G.: Price direction prediction on high frequency data using 
deep belief networks. In: Applied Computer Sciences in Engineering, pp. 74–83. 
Springer (2016) 
17. Olshausen, B.A., Field, D.J.: Natural image statistics and e?cient coding. Net-work 
Comput. Neural Syst. 7(2), 333–339 (1996). https://doi.org/10.1088/0954- 
898X 7 2 014, pMID: 16754394 
18. Sandoval, J.: Empirical shape function of the limit-order books of the USD/COP 
spot market. In: ODEON, p. 7 (2013). https://ssrn.com/abstract=2408087 
19. Sandoval, J., Nino, J., Hernandez, G., Cruz, A.: Detecting informative pat-terns 
in ?nancial market trends based on visual analysis. Procedia Com-put. 
Sci. 80, 752–761 (2016). http://www.sciencedirect.com/science/article/pii/ 
S1877050916308407. International Conference on Computational Science 2016, 
ICCS 2016, 6-8 June 2016, San Diego, California, USA 
20. Shen, F., Chao, J., Zhao, J.: Forecasting exchange rate using deep belief networks 
and conjugate gradient method. Neurocomput. 167, 243–253 (2015). https://doi. 
org/10.1016/j.neucom.2015.04.071 
21. Takeuchi, L., Lee, Y.: Applying Deep Learning to Enhance Momentum Trading 
Strategies in Stocks (2013) 
22. Wang, Z., Oates, T.: Encoding Time Series as Images for Visual Inspection and 
Classi?cation Using Tiled Convolutional Neural Networks (2015). https://pdfs. 
semanticscholar.org/32e7/b2ddc781b571fa023c205753a803565543e7.pdf 
23. Yeh, S., Wang, C., Tsai, M.: Corporate Default Prediction via Deep Learning 
(2014). http://teacher.utaipei.edu.tw/cjwang/slides/ISF2014.pdf
Implementing Clustering and Classi?cation 
Approaches for Big Data with MATLAB 
Katrin Pitz(&) and Reiner Anderl 
Technische Universität Darmstadt, 64283 Darmstadt, Germany 
pitz@dik.tu-darmstadt.de 
Abstract. Data sets grow rapidly, driven by increasing storage capacities as 
well as by the wish to equip more and more devices with sensors and con-nectivity. 
In mechanical engineering Big Data offers new possibilities to gain 
knowledge from existing data for product design, manufacturing, maintenance 
and failure prevention. Typical interests when analyzing Big Data are the 
identi?cation of clusters, the assignment to classes or the development of 
regression models for prediction. This paper assesses various Big Data 
approaches and chooses adequate clustering and classi?cation solutions for a 
data set of simulated jet engine signals and life spans. These solutions include k-means 
clustering, linear discriminant analysis and neural networks. MATLAB is 
chosen as the programming environment for implementation because of its 
dissemination in engineering disciplines. The suitability of MATLAB as a tool 
for Big Data analysis is to be evaluated. The results of all applied clustering and 
classi?cation approaches are discussed and prospects for further adaption and 
transferability to other scenarios are pointed out. 
Keywords: Big DataClusteringClassi?cationK-means 
Discriminant analysisNeural networksMATLAB 
1 Introduction 
When it comes to Big Data, there is no solitary, generally agreed-on de?nition, neither 
in academia nor in industry [1]. However, most experts agree on Big Data exceeding 
common storing capacities and computing methods [2]. It has also become popular to 
outline Big Data via the 3 Vs introduced by [3]: volume, velocity, and variety. Volume 
means that an increasing amount of data is to be handled, even though the speci?c 
numbers for when to start labeling data as Big Data vary. Velocity stresses the fact that 
data is generated, processed or modi?ed at high speeds, in some applications close to 
real time. Variety describes the state the data is in. This can range from structured data 
to semi-structured or unstructured data. Text written or spoken by humans is often 
referred to as unstructured data. Though, [2] emphasizes that many sources of Big Data 
are not as unstructured as they may seem at ?rst glance, but that it rather takes some 
extra time and effort to ?nd the logical flow they do possess. In addition to the three Vs 
wider de?nitions have been proposed over the years leading to ?ve or even more Vs 
depending on the source consulted. For example, [4] presents value and veracity as 
additional Vs with value considering the potential to contribute to entrepreneurial or 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 458–480, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_35
scienti?c progress and veracity assessing the consistency and trustworthiness of the 
data. Some other characteristics of Big Data are its exhaustiveness (capturing entire 
populations or systems), flexibility (offering the possibility to add new aspects or 
expand in size) and relational character (allow for linking to other data bases) [1]. 
The sources and drivers of Big Data are numerous. Web data is referred to as the 
original Big Data [2] and often involves interests such as understanding customer 
behavior. It may include social media data, interaction data or voluntarily submitted 
data. Authors in [5] names mobile sensors, video surveillance, smart grids, geophysical 
exploration and medical experimentation as further drivers of the data deluge. In the 
?eld of mechanical engineering the focus lies on data generated by machinery. 
A growing number of sensors and actuators are embedded into technical systems so 
that some even reach the state of operating completely autonomously. Furthermore, the 
interest in monitoring devices and equipment while it is in use is increasing rapidly. 
Cameras, GPS units and radio frequency identi?cation (RFID) tags are only some 
examples of how this development currently manifests itself [1]. 
Big Data is closely linked to the ?elds of business intelligence (BI) and data 
mining. It can be considered an extension of BI solutions as they are primarily built to 
analyze structured data whereas Big Data approaches aim to handle all kinds of data 
[6]. Still, BI solutions should not be discarded too quickly for the sake of Big Data 
strategies. It seems more promising to integrate and conjoin Big Data into the data a 
business already has and the methods that proved successful throughout its history [2, 
7]. Data mining, on the other hand, denotes a set of methods to make use of data by 
discovering similarities, patterns, trends, outliers or clusters [8]. Established data 
mining techniques are focused on analyzing traditional, structured data [6]. Big Data 
now aims at larger amounts of data which are more complex in their structure. This 
does not necessarily mean that existing methods need to be overthrown and replaced, 
but it at least poses questions of scalability and adaption [2]. Moreover, it is to be 
discussed whether the tried and trusted data base language SQL (structured query 
language) will still serve the purposes. NoSQL, columnar databases, massively parallel 
processing (MPP) databases, cloud computing and frameworks like Hadoop are some 
of the new technologies on the rise [2, 6]. 
This paper addresses various Big Data approaches, highlights their advantages as 
well as their shortcomings and describes how they can be implemented with the help of 
MATLAB 2017a, an established software tool for engineering applications [9]. The 
data on which the implementation and validation is based stems from the National 
Aeronautics and Space Administration (NASA) – Prognostics Center of Excellence 
(PCoE). This institution collects and provides data sets from science and engineering 
that are free of cost and allow researchers and practitioners to explore and enhance data 
mining and machine learning algorithms [10]. The focus of this paper lies on clustering 
and classi?cation. In addition to implementation matters, general conclusions on 
MATLAB’s suitability for Big Data purposes are drawn and the scalability of existing 
MATLAB code is discussed. 
The paper divides into seven sections. The introduction given in this section is 
followed by a description of the data base in Sect. 2. Section 3 explains the criteria 
based on which the approaches for clustering and classi?cation are chosen and outlines 
Implementing Clustering and Classi?cation Approaches 459
their theoretical foundations. The implementation of these approaches in MATLAB is 
part of Sect. 4. Section 5 presents and discusses the results of both clustering and 
classi?cation. The paper concludes with an outlook on future work in Sect. 6 and a 
summary in Sect. 7. 
2 Database 
The data set chosen for this paper is part of the NASA PCoE data repository. This 
repository currently comprises 16 data sets ranging from biology to electrical or 
mechanical engineering topics. What they all have in common is a time dependency 
and an information on failure, i.e. they represent time series from a speci?c starting 
condition until failure [10]. As this work is located in the ?eld of mechanical engi-neering 
a data set with an according background is chosen: “6 Turbofan Engine 
Degradation Simulation Data Set”. This set, introduced by [11], deals with a classical 
jet engine with the following main components: low pressure compressor (LPC), the 
high pressure compressor (HPC), the outer shaft (N1), the core shaft (N2), the high 
pressure turbine (HPT) and the low pressure turbine (LPT). 
The data are the results of simulations using an engine model. It is not a record of 
signals transmitted by engines physically existing and operated by airlines. Variations 
in the production quality of the original engines and degradation effects are included in 
the simulation. Each time series in the data set starts at an arbitrary point in the engine’s 
life where it is not as good as new anymore but has not failed yet. 
The data set separates into training data and test data. The training data serve to 
train a model whereas the test data are used to validate the accuracy of the created 
model. The time series from the training data provide the time of failure. They contain 
all data points from starting condition to failure. The test data time series, on the 
contrary, cut off at a point prior to engine failure. The created model can then be used to 
estimate the remaining useful life (RUL) of the engine. 
Time series enclose 21 different signals an engine would provide, e.g. temperatures, 
pressures, shaft speeds and amounts of fuel and coolant. Three more signals that are 
useful to determine an engine’s operation condition are available in each time series: 
flight altitude, Mach number, and throttle angle. However, these signals shall not be 
discussed in more detail as one of the paradigm shifts in applying Big Data approaches 
is to focus more on what the data itself reveal on a statistical level and less on building 
physical models that are comprehensible in all its interrelationships [12]. 
The entire data set is divided into ?ve different subsets varying in complexity. 
Some subsets show 6 different operating conditions, some only show 1 operating 
condition. Analogously, some subsets exhibit 2 different failure mechanisms while 
others only have 1 failure mechanism. This information on subsets, operating condi-tions 
and failure mechanisms is available with the data set itself. Table 1 gives an 
overview of how the data set divides into subsets. 
460 K. Pitz and R. Anderl
The size of the chosen data set is 12 Mb. This is a relatively small size, considering 
that some authors claim the lower boundary of Big Data to be several terabyte or 
petabyte [4]. However, a clear de?nition of how big Big Data has to be does not exist 
[5]. Even though the data set may not have the highest volume, the remaining V criteria 
should not be dismissed. For example, it exhibits high variety and value characteristics. 
Furthermore, it is feasible to test Big Data approaches with this data set while 
simultaneously allowing for upscaling to larger amounts of data in the implementation. 
3 Chosen Approaches 
There are different motivations for building models based on the jet engine data 
described above. Typical engineering questions, that would be of interest for an engine 
operator as well, are: 
• Are operating conditions and failure mechanisms identi?able based on the signals 
solely? 
• How should an alarm system for imminent engine failures be designed? 
• How can the remaining useful life of an engine be estimated? 
In terms of data analysis, the ?rst question relates to clustering, the second to 
classi?cation and the third to regression or, more generally, prognostics. This paper 
focuses on the former two as they lay a base for further prognostic tools. Moreover, 
assessing clustering and classi?cation techniques allows to compare supervised versus 
unsupervised learning [13]. 
3.1 Clustering 
Clustering aims at identifying different groups of related data within a larger data set. 
The grouping is carried out based on the mere data. No additional information stating 
which point or series belongs to which group is available. A veri?cation whether or not 
the data have been clustered correctly is not possible. Clustering is therefore considered 
a method of unsupervised learning [13]. 
For the chosen data set it is known that 6 different operating conditions and two 
different failure mechanisms exist. However, it cannot be retrieved which time series is 
from which group. It can be considered a classical clustering scenario, extended by the 
fact that the number of clusters is explicitly given. 
Table 1. Subsets of the engine data set 
Subset Number of operating conditions Number of failure mechanisms 
1 1 1 
2 6 1 
3 1 2 
4 6 2 
5 6 1 
Implementing Clustering and Classi?cation Approaches 461
Data within one cluster shall be as homogeneous as possible whereas the clusters 
themselves shall be as distant from one another as possible. Different distance measures 
are a main distinguishing feature between different clustering methods [14]. Established 
methods include hierarchical clustering, k-means clustering and Gaussian mixture 
models. Hierarchical clustering methods do not need a priori information on how many 
clusters are expected, but reveal an initially unknown cluster structure within the data 
set. The major drawback is that hierarchical methods are accompanied by high com-putational 
costs [15]. k-means clustering and Gaussian mixture models both belong to 
the ?eld of partitioning clustering. They both need the information on the number of 
clusters to be found. k-means clustering strictly assigns data points to clusters whereas 
Gaussian mixture models calculate belonging probabilities. For this work, k-means 
clustering is chosen as it is computationally ef?cient [15] and well compatible with 
MATLAB and other Big Data technologies such as Hadoop and MapReduce. 
The basic idea of deploying k-means clustering is to divide all n elements into 
k disjoint clusters so that the Euclidean distance between elements and cluster centers is 
minimized. The clusters’ centers are denoted in the matrix M = [m1, …, mk]. Each 
vector mj contains the center of the j-th cluster Cj which is calculated as follows: 
mj 
¼ 
1 
nj 
X 
xi2Cj 
xi; 
ð1Þ 
with njbeing the number of elements belonging to the j-th cluster and xithe values of it 
i-th observation in this cluster. The algorithm for performing k-means clustering can 
then be described by the following four steps [14]: 
• Initialize clusters by specifying cluster centers, either randomly or deliberately. 
Calculate the preliminary matrix M based on the speci?ed cluster centers. 
• Assign each element in the data set to its nearest cluster Cl, i.e. 
xi 
2 
Cl if jjximljj\jjximjjj 
for i 
¼ 
1; ...; n; 
j6¼ 
l; i 
¼ 
1; ...; k: 
ð2Þ 
• Update matrix M based on the current assignment of elements to clusters using (1). 
• Repeat the second and third step until no further changes occur in the cluster 
allocation. 
k-means clustering is dependent on the initial choice of cluster centers. The algo-rithm 
converges to a local minimum of distances between elements and centers. 
Depending on the initial centers the ?nal clusters may vary. Choosing them therefore 
becomes an essential part of performing k-means clustering. However, choosing them 
by hand is laborious and opposing to the idea of evaluating Big Data as automatically 
as possible. The purely random selection of initial cluster centers, on the other hand, 
may lead to long run times of the algorithm and clusters that are not close to the optimal 
solution [16]. An algorithm that overcomes both shortcomings by choosing starting 
centers based on weighed probabilities that account for the structure in the data is called 
k-means++ and was ?rst proposed in [17]. 
462 K. Pitz and R. Anderl
k-means++ chooses the ?rst center c1 randomly from all elements available in the 
data set. It then calculates the distances D(xi) of all elements to the ?rst center. The 
following center c2 is chosen based on a weighed probability, ensuring that elements 
are more likely to be chosen the higher their D2 value, i.e. their distance from the ?rst 
center, is. After that D(xi) is calculated again for each element, now denoting the 
smallest distance between xi and any center chosen so far. The next center is chosen 
based on the updated D2 probabilities. These last two steps are repeated until all 
k starting centers have been set. 
Modi?cations of the k-means clustering are k-medians clustering and k-medoids 
clustering. The use of medians makes the method more robust in terms of outliers. k-medoids 
clustering extends the original method by requiring that each cluster center 
needs to coincide with an element of the data set. This makes the method applicable for 
categorical data as well. However, both extensions are not necessary for the data 
considered in this paper so that k-means clustering is chosen for implementation. Prior 
to running the classical k-means clustering the above mentioned k-means++ is applied 
to determine the cluster centers to start with. 
3.2 Classi?cation 
Classi?cation follows a similar aim as clustering but is part of supervised learning [13]. 
It also intends to sort data into groups, in this case called classes, which are as 
homogeneous as possible. What sets classi?cation apart from clustering is that in 
classi?cation procedures information on the actual class af?liation is available. The 
model is trained with a set of training data for which the true class of each element is 
known. The trained model can then be used to assign new data for which the class 
af?liations are unknown to the appropriate classes. 
The main interest in the jet engine scenario lies on the remaining useful life of the 
individual engines. An operator of engines might wish to know which engines are close 
to failure so that failure may be avoided by means of shop visits and maintenance. 
Proximity to failure is indicated by low RUL values, given in flight cycles, e.g. RU = 5 
means that the engine will only be able to perform ?ve more flights before it fails. 
Creating a warning system based on RUL values and their criticality is a legitimate, 
self-evident use case for classi?cation. Three classes are de?ned in Table 2. 
Classi?cation methods include decision trees, k-nearest neighbors, support vector 
machines, naive Bayes, and discriminant analysis. An extensive introduction can be 
found in [18]. All methods have advantages as well as shortcomings so that a general 
statement on which method is superior to another without considering the speci?c use 
Table 2. Classes for engine failure warning system 
Class no. Range of values Signi?cance System action 
1 0 
f 
RUL 
U 
25 Engine very close to failure Alarm 
2 25 < RUL 
U 
125 Engine heading toward failure Warning 
3 RUL > 125 Normal operation None 
Implementing Clustering and Classi?cation Approaches 463
case is hardly possible. A problem of classi?cation that might arise regardless of the 
chosen method is the phenomenon of over?tting. Over?tting denotes the effect that a 
classi?cation algorithm adapts overly well to the training data, i.e. scores a high 
accuracy within this subset of data, but has a high error rate when classifying test data 
[8]. One way to reduce over?tting is the use of cross validation. The data set is then 
divided into k subsets. The algorithm is trained with k -s 1 of these sets leaving the k-th 
one for validation. This procedure is repeated until each subset has once been the 
validation set. It obviously increases the computational cost compared to the more 
basic holdout validation which only once divides the data set into training and vali-dation 
data. It can be considered a trade-off between over?tting reduction and com-putational 
ef?ciency. In this work, the decision is taken in favor of holdout validation. 
Linear Discriminant Analysis. For this implementation a linear discriminant analysis 
is chosen based on the facts that the linear case is ef?cient to calculate, allows a quick 
classi?cation and is supported by MATLAB’s capabilities. The main reasons to dismiss 
the other classi?cation possibilities are that naive Bayes is a rather simple method that 
has its strength in serving as a benchmark for other methods. Support vector machines 
allow quick classi?cation and are highly generalizable but go along with high com-putational 
effort, the need for transformations in speci?c cases [19] and an incom-patibility 
with MATLAB’s Big Data functions. k-nearest neighbors disqualify, because 
it is a method prone to outliers [15] and adverse in terms of memory space as the whole 
data set has to be kept available as long as the algorithm is carried out. Decision trees 
give the opportunity to understand the classi?cation but need downstream pruning 
steps [18] or parallelization in form of random forests [20] to handle over?tting. 
Discriminant analysis is a method from the ?eld of multivariate statistics. At ?rst, a 
distribution function is calculated for each class. Commonly, a multivariate normal 
distribution is chosen whose density function is [21]. 
fXðxÞ ¼ 
1 
???    
ð2 
??????????????????g                  
pÞp 
???    
detðRÞ 
p 
? 
exp 
x 
1 
2 
ðx 
x 
lÞT R1 ðx 
x lÞ 
Þ Þ 
: 
ð3Þ 
X is the p-dimensional random variable that, in the engine data example, is composed 
of the different signals each engine provides as mentioned in Sect. 2. µ, the vector of 
means, and R, the covariance matrix, are to be determined individually for each class. 
The borders between two classes are de?ned as where their density functions have the 
same value. The functions describing those borders are called discriminant functions. If 
the assumption of identical covariance matrices among all classes is fair, the method 
simpli?es to linear discriminant analysis. The discriminant functions are then hyper 
planes or, regarding a two-dimensional case, linear functions as shown in Fig. 1. 
Neural Network. Linear Discriminant Analysis: As an alternative to linear dis-criminant 
analysis, classi?cation is carried out with the help of a neural network. The 
reasoning behind that is to create the option of comparison and to give a prospect for 
future work that might expand into the ?eld of regression for which neural networks are 
also suitable [18]. Neural networks have become popular, sometimes being advertised 
as a magical solution to all computational problems [18]. They are in fact a very 
464 K. Pitz and R. Anderl
powerful and general method that can in theory approximate any complex interrelations 
[8]. A neural network is a nonlinear statistical model whose number of layers and 
whose activation functions influence this complexity the model is able to represent 
[18]. It is best applied in settings where prediction is more important than interpretation 
of results [18]. 
Neural networks can be considered a simulation of the human brain and its learning 
process. They involve neurons, weighed connections, and external stimuli. In the living 
organism learning signi?es the strengthening of synaptic connections between neurons 
in response to an external stimulation that has been received. In the neural network this 
can be modeled via weights and activation functions [8]. 
Figure 2 shows the general structure of a neural network with its input neurons, 
output neurons, and two exemplary hidden layers. Hidden layers do their name justice 
as they are not directly observed but only used internally in the calculation process. 
Fig. 1. Example of a linear discriminant analysis for two dimensions [22]. 
Fig. 2. General structure of a neural network [23]. 
Implementing Clustering and Classi?cation Approaches 465
In a classi?cation scenario with k classes the number of neurons in the output layer 
is k as well so that each neuron represents one class. The input neurons stand for the 
signals the model is fed with. The hidden layers in between represent the model to be 
trained in order to assign a data element with certain input signal characteristics to its 
appropriate class. This means that in the jet engine use case 21 signals can be drawn 
upon for input neurons, and the 3 classes de?ned in Table 2 serve as output neurons. 
Each connection is allocated to a weight wij. The ?rst index i denotes the prede-cessor 
this connections comes from, the second index j stand for the layer of the 
network that is currently at focus. The variables ai state whether or not a connection is 
activated. The sum of all incoming ai, weighed with the associated wij, is calculated by 
zj 
¼ 
X 
n 
i¼0 
wijai; 
ð4Þ 
with n being the number of preceding neurons. The value of z is then fed into the so 
called activation function g(zj). Typically, the sigmoid function 
gsigmoidðzjÞ ¼ 
1 
1 þ ezj 
ð5Þ 
is chosen for this purpose. An alternative worth considering, especially in regard to 
performance in MATLAB [24], is the hyperbolic tangent function 
gtanhðzjÞ ¼ 
2 
1 þ ezj 
h 
1: 
ð6Þ 
The result of function g(zj) gives the activation aj the neuron propagates into the 
next layer of the network. Figure 3 illustrated the activation process. The neuron shown 
in this ?gure exhibits a bias fed into it, represented by a0 which is constantly 1 and 
weight w0j, which is a standard modeling technique [18, 25]. 
Fig. 3. Model of one neuron [25]. 
466 K. Pitz and R. Anderl
If a neural network only sends signals to its subsequent layers, as discussed so far, it 
is called a feedforward network. This is the approach widely used [18]. Networks 
which send signals back to their preceding layers exist as well and are sometimes 
referred to as networks possessing a memory. The more common name is recurrent 
neural network [26]. 
One dif?culty in using a neural network for classi?cation is to determine an ade-quate 
size. There are no established rules on how many layers and neurons to use. It 
rather is an iterative process of experimentation, facilitated by expertise and experience, 
to ?nd the right size for the speci?c scenario [18]. If too many neurons are chosen, 
over?tting occurs. If there are too few neurons, the network might not be able to 
suf?ciently model complex interrelations in the data. The size of the neural network can 
either be determined in a destructive approach or in a constructive one [27]. Destructive 
in this case means that the starting point is a big network from which neurons are then 
gradually being removed until the performance of the network starts to decrease. 
Opting for the constructive approach is to start with a small network and add neurons 
until the performance is not enhanced any further. 
Once the structure of the neural network is set, it has to be trained. The training data 
subset is used for this step. The generic approach to minimize errors is to use a gradient 
descent method, also called backpropagation. Detailed equations can be found in [18]. 
The fastest algorithm MATLAB offers for training neural networks with up to several 
hundreds of neurons is the Levenberg-Marquart backpropagation algorithm [28]. It was 
?rst proposed in [29] and applied to neural networks in [30]. The main underlying idea 
is to avoid calculating the computationally intensive Hessian matrix and choosing an 
approximation instead. 
In this work, a standard feedforward neural network with bias and hyperbolic 
tangent activation functions is chosen. The size of the network is determined via the 
destructive approach described above. Backpropagation is carried our via the Leven-berg–
Marquardt algorithm. 
Both, the results of the linear discriminant analysis and of the neuronal network, are 
discussed and compared in Sect. 5. 
4 Implementation with MATLAB 2017A 
The implementation of the chosen approaches to process the engine data set and solve 
the problems of clustering and classi?cation is carried out using the programming 
environment MATLAB, release 2017a. Even though, MATLAB may not be the most 
popular programming language when judged in an overall comparison, it is still listed 
around rank 20 in current rankings [31, 32]. It has its strengths in matrix-based 
numerical calculations and is widely used in science and engineering. Since release 
2016b MATLAB offers new functionalities for handling Big Data, e.g. tall arrays, a 
new data type that allows users to carry out calculations with data that would actually 
be too big to ?t into the working memory by breaking it down into heaps and eval-uating 
equations repeatedly. This process can also be parallelized. 
Through free educational licenses for teaching staff and affordable student licenses, 
MATLAB has gained some popularity in academia. It may be explained thereby that 
Implementing Clustering and Classi?cation Approaches 467
graduates are acquainted with it and established it in industry as well. Assuming that 
MATLAB is an available tool to practitioners in the ?eld of mechanical engineering, 
this paper aims at exploring how and to which extent it can be used to dive into Big 
Data analysis. 
In order to fully reproduce the results discussed in this paper the following 
MATLAB components are required: 
• MATLAB R2016b or newer, 
• MATLAB Parallel Computing Toolbox, 
• MATLAB Statistics and Machine Learning Toolbox, 
• MATLAB Neural Network Toolbox. 
4.1 Workflow 
Regardless of the Big Data approach at focus a general workflow for implementation 
can be deployed. The workflow this paper follows, for clustering as well as for clas-si?cation, 
is shown in Fig. 4. 
It follows the recommendation given in [33] and is in line with MATLAB’s 
guidelines [34]. The problem de?nition has already been laid out in Sect. 3. The ?rst 
three preparation steps are described in the following paragraph, with a focus on 
dimensional reduction. The processing step will be dealt with under the headline of 
parallel and distributed computing. Model design, validation and upscaling are dis-cussed 
in Sect. 5. 
Fig. 4. Workflow for the implementation of Big Data approaches. 
468 K. Pitz and R. Anderl
4.2 Data Preparation 
First of all, a transformation is performed on the input signals in order to create z-scores. 
In statistics, z-scores are random variables with mean 0 and standard deviation 
1. MATLAB offers the function zscore to obtain z-scores. Using this standardized 
form of variables helps comparing them and making them processible by statistical 
methods. It can be regarded a step of preprocessing as depicted in Fig. 4. 
Subsequently, 5000 data points are randomly sampled from the training data. Data 
point does not mean that a certain engine is chosen, but that one value from one signal 
of an engine at an arbitrary time is picked. Operating conditions or failure mechanisms 
are not yet considered. datasample is the MATLAB function used for this sampling 
step. The reasoning behind the sampling step is that the randomly chosen points are 
representative for the entire set and that it is more ef?cient in terms of computational 
cost to explore the sample rather than the entire set. 
Scatter plots are chosen as an easy and intuitively accessible means of data 
exploration. MATLAB’s function to create these is named scatter. Figure 5 
exemplarily shows the scatter plots for signals 10 to 21. All x-axes show the negative 
RUL value. All y-axes are without unit because of standardization. 
It is evident that some signals show a clear trend over time while others remain 
unaffected by time or just react with increased noise as time advances. Signal 11 for 
example has a positive trend, i.e. when an engine is close to failure signal 11 tends to 
have high values. Signal 21 gives an example of a negative trend, i.e. its values 
decrease the closer an engine gets to failure. Signals like those two examples should be 
Fig. 5. Signals of the engine data set, subset 1, training data, sample of 5000 points, z-scored, 
plotted over negative RUL values. 
Implementing Clustering and Classi?cation Approaches 469
included into models because their tendencies can help to categorize new data. Signal 
10 exhibits no trend over time but stays constant. Therefore, it cannot contribute 
information to a model that is built on time-dependencies. Signal 17 shows a weak 
positive effect but not as distinct as others do. It could be argued whether or not to 
include it. To opt for the safe side, it is dismissed in this work. Signal 14 is exemplary 
for a signal that has a varying amount of noise. One might tend to interpret the points 
close to RUL = 0 as an upward trend, but indeed they are just scattered further around 
a signal value of 0. As the time span just before failure is of special interest for a 
warning system, a signal with high noise in this area is of little help and should also be 
excluded from the model. 
Applying this reasoning to all signals available, the ?rst half not shown in Fig. 5 
and the second half documented in Fig. 5, the list of relevant signals to train time-dependent 
models with results in: 
2; 3; 4; 7; 8; 11; 12; 13; 15; 20; 21: 
This can be considered a dimensional reduction. The original 21 signals were 
reduced to 11 relevant ones. Reducing dimensions is a standard step in preparing data 
for statistical learning algorithms. The less information is dragged along unnecessarily 
the more ef?cient the algorithms work. Choosing the relevant inputs manually works 
?ne for a reasonable number of input variables. If the number increases, the process can 
easily be automated, e.g. with the help of correlation coef?cients. corrcoef is the 
corresponding MATLAB function. 
Note that Fig. 5 only shows a subset of the engine data set. The reduced list of 
signals is to be seen as a ?rst attempt at the least complex case of 1 operating condition 
and 1 failure mechanism which is represented by subset 1. Processing other subsets 
may require further selection of signals. 
4.3 Parallel and Distributed Computing 
When processing large amounts of data, as is typical for Big Data applications, there 
are two steps to be considered in order to optimize computing times: parallelizing and 
distributing the computation. Parallel computing refers to the internal processes in one 
device, e.g. a laptop, workstation computer or computing server. Computations are 
divided among multiple processor cores of this device. Distributed computing enhances 
this concept by involving more than one device. Computing clusters are one way to 
realize this. 
Making data ?t for parallel and distributed computing usually requires some steps 
in front. Working with MATLAB and the described engine data set, those are the 
following: First of all, CSV ?les are created. Each CSV ?le contains the data of one 
engine. All ?les are then pooled together with the help of a datastore object. 
A datastore object in MATLAB does not create one large variable or container 
with all the separate data in it but solely captures the storing path of the ?les. When data 
are needed for calculation they are transformed from the datastore object into a 
tall array. tall arrays do not load all the data into the working memory at once but 
process data in heaps. When tall arrays appear in a MATLAB script the respective 
470 K. Pitz and R. Anderl
equations are not evaluated immediately. An explicit gather command is needed to 
execute calculations. The general aim when writing MATLAB code for Big Data is to 
reduce gather commands to a minimum, because they are what drives computational 
cost. It should also be checked whether all functions used are compatible with tall 
arrays. Some examples used in this work that support the use of tall arrays are: 
zscore, kmeans, discretize, and double. Self-written functions can handle 
tall arrays as well. 
In this paper, tall arrays are evaluated locally, using all processor cores available. 
This form of parallelization is why the MATLAB Parallel Computing Toolbox is 
necessary for executing the code. The size of the data set does not make the use of 
distributed computing necessary. However, if bigger data sets were processed, the same 
MATLAB code would still be applicable with only slight adjustments via the 
mapreduce function. This would allow for the use of computer clusters or cloud 
computing solutions such as Hadoop and Spark. 
The neural network used for comparative purposes in the classi?cation scenario 
functions without tall arrays but has its performance optimized by the MATLAB 
Neural Network Toolbox as well as by parallelization. 
The third toolbox in use, MATLAB Statistics and Machine Learning Toolbox, does 
not provide for parallel or distributed computing but for the statistical methods 
themselves. It offers pre-de?ned functions for support vector machines, decision tress, 
k-nearest neighbors, k-means, k-medoids, hierarchical clustering and many more, some 
of which are directly applied to obtain the results discussed in the next section and 
some of which were adduced as comparisons beforehand in order to ?nd the right 
approaches for the engine data scenario. 
5 Results and Discussion 
This section presents and discusses the results of both the clustering and the classi?- 
cation problem. 
5.1 Clustering 
Clustering is carried out in order to determine groups of engines with similar operating 
conditions and failure mechanisms. 
Clustering for Operating Conditions. Different operating conditions are only 
prevalent in subsets 2, 4, and 5. Therefore, only those subsets are subject to this kind of 
clustering. Input variables are the three condition signals flight altitude, Mach number 
and throttle angle as mentioned in Sect. 2. Having the information that these three 
allow to deduce how the engine is operated while all other signals are just simulated 
sensor signals recording internal processes in the engine, makes them an easy and 
obvious choice. Clustering has been performed on the training data only. Two itera-tions 
of k-means clustering were needed to identify all six clusters shown in Fig. 6. 
Implementing Clustering and Classi?cation Approaches 471
All clusters turn out very concentrated, making them appear like six single points 
even though a total of 5000 points is plotted. The cluster centers are given in Table 3. 
The results for subsets 4 and 5 are similar, showing highly concentrated centers as well. 
Clusters those are as clearly distinguishable as these could have been identi?ed man-ually 
just as well. Nevertheless, automated clustering embodies much less effort and is 
more generalizable as it can also be used for complex, spread-out clusters. 
The results obtained from the training data can be transferred to the test data. No 
modi?cations need to be made. It could be considered to use the cluster centers 
identi?ed from the training data as starting points for a clustering algorithm applied on 
the test data. Still, the k-means++ algorithm which does not need manual input for 
starting centers proved to be very effective as well in this scenario, given the fact that 
only two iterations were necessary. 
Clustering for Failure Mechanisms. Subsets 3 and 4 exhibit different failure 
mechanisms and have therefore been considered in this part of clustering. It is assumed 
that failure is a time-dependent phenomenon for the engine scenario. The closer an 
Fig. 6. Identi?ed clusters for operating conditions, subset 2, training data, sample of 5000 
points, using negative RUL values. 
Table 3. Cluster centers for operating conditions in subset 2 
Cluster Cond. 1 (flight altitude) Cond. 2 (Mach number) Cond. 3 (throttle angle) 
1 (yellow) 2 0.00 100 
2 (green) 25003 0.62 60 
3 (red) 45003 0.84 100 
4 (purple) 20003 0.70 100 
5 (blue) 10003 0.25 100 
6 (orange) 35003 0.84 100 
472 K. Pitz and R. Anderl
engine is to failure the higher or lower certain signals will be, indicating malfunctions 
in parts of the engine. 21 signals are available in total. Section 4.2 gives a list reduced 
to 11 signals that show a clear tendency over time. For this clustering it has been 
compared whether using all 11 signals or further reducing the number of input variables 
is more ef?cient. The decision is taken in favour of reduction. Essential signals could be 
reduced to: 
7; 12; 15; 20; 21: 
Figure 7 shows why they are the most useful signals for identifying clusters of 
failure mechanisms. 
All signals chosen as inputs have a clear diverging trend towards RUL = 0. 
A comparison with Fig. 5, in which a subset with only 1 failure mode is plotted and no 
such diverging point clouds can be spotted, suggests that it is a valid indicator for the 
failure modes in this case. The two different failure modes identi?ed via k-means 
clustering are already highlighted in Fig. 7. For signal 15 for example, it can be 
concluded that high values towards the end of the engine’s life indicate the ?rst failure 
mode (red) while low values indicate the second one (blue). 
Plotting the clusters like in Fig. 6 is no longer feasible as more than three 
dimensions are used for failure mechanism clustering. Cluster centers are summarized 
in Table 4. It is striking that the cluster centers are very close to each other with respect 
to all ?ve signals. Table 3 showed greater distances, at least for input Cond. 1. Still, the 
k-means algorithm could identify failure mechanism clusters as ef?ciently as before. 
Again, results are obtained after two iterations. 
Fig. 7. Signals of the engine data set, subset 3, training data, sample of 5000 points, z-scored, 
plotted over negative RUL values, different failure modes color coded in blue and red. 
Implementing Clustering and Classi?cation Approaches 473
Only training data are used for clustering. The time-dependency makes data points 
close to RUL = 0 more valuable than those with high RUL values. Hence, only the last 
ten points of each time series are considered. In some cases those ten data points from 
the same engine are not all assigned to the same cluster. However, as an engine is 
assumed to only fail from one failure mechanism, a clear assignment to one or the other 
cluster has to be made. Whenever this case occurs, the cluster the engine is assigned to 
most often out of the ten times is chosen. 
Time-dependency is what makes it dif?cult to transfer the failure mode clustering 
from the training data to the test data. In the training data set all time series are available 
until the event of failure whereas in the test data set time series are cut off at a random 
RUL value, potentially a high one. For test data with a low RUL value it might be 
possible to apply the clusters identi?ed from the training data as diverging trends in the 
relevant signals already show their effects. For new data with high RUL values this 
will, if at all, be accompanied by great uncertainty. 
Furthermore, it should be stated that subset 4 requires nested clustering as multiple 
operating conditions and multiple failure mechanisms are present at the same time. This 
is why the result for subset 4 consists of six pairs of clusters. The clustering for failure 
mechanisms is carried out after the clustering for operating conditions but otherwise 
does not differ from the procedure described before. 
5.2 Classi?cation 
Classi?cation has the aim of assigning elements of the engine data set to the right class 
of criticality regarding the RUL value. Three classes have been de?ned in Table 2. 
The quality of classi?cation can be evaluated as the actual class af?liations are 
available. Some erroneous classi?cations may be rated more undesirable than others. 
Considering a warning system for engine failure, it is worse to receive a normal 
operation prompt when actually a warning should be given than to receive an erroneous 
warning when the engine is still in normal condition. 
The clustering results are made further use of as additional inputs for classi?cation. 
For example, engines that were identi?ed as belonging to the same failure mechanism 
may be more likely to fall into the same class of criticality as well. 
Classi?cation via Linear Discriminant Analysis. The ?rst method applied for clas-si?cation 
is linear discriminant analysis. It needs a training time of 1.3 s on a con-temporary, 
customary laptop (Lenovo Thinkpad E550, Intel Core i5-5200 processor). 
Training and processing of the entire data set takes approximately 10 s. The results are 
summarized in the form of a confusion matrix in Fig. 8. 
The diagonal of the confusion matrix documents correct classi?cation, e.g. the 
upper left corner of the matrix states that 8.6% of all data elements (5249 in absolute 
Table 4. Cluster centers for failure mechanisms in subset 3 
Cluster Sign. 7 Sign. 12 Sign. 15 Sign. 20 Sign. 21 
1 551.62 519.92 8.52 38.47 23.09 
2 567.57 534.94 8.24 39.57 23.75 
474 K. Pitz and R. Anderl
numbers) have been classi?ed for alarm and were real alarm cases. The lower right 
corner sums up all diagonal entries, showing that in total 74% of all elements have been 
classi?ed correctly whereas 26% have suffered misclassi?cation. 
Three groups of misclassi?cations should be looked at more closely: Cases in 
which the target class was alarm but the model only chose warning or normal and cases 
in which the engine was operating normally but the model gave an alarm. The ?rst two 
mislead the operator to overestimate the engine’s performance and not consider 
checkup or maintenance work. The latter may lead to premature shop visits and cause 
unnecessary costs. For safety reasons the ?rst two are to be considered even more 
critical than the latter one which has economic consequences only. The fact that all 
three of these misclassi?cations occur at a very low rate, 2.0, 0.0 and 0.1% respec-tively, 
indicate good quality of the trained model. 
Another aspect to be considered when developing an engine failure warning system 
is that there should be at least one alarm before engine failure. Engines failing without 
prior notice are highly undesirable in the intended system. Figure 9 shows that no such 
case occurred for the linear discriminant analysis model. 
Vertical lines in Fig. 9 represent individual engines. The y-axis shows its simulated 
life in flight cycles. It can be concluded from the plot that most engines start in normal 
condition, actually operating normally and correctly classi?ed so. As simulations start 
at an arbitrary point of time in an engine’s life some already show warning condition at 
the beginning of time counting. The red tips of all lines demonstrate that each engine 
has given multiple alarms before failure. Engine 118 for example has the highest line in 
the plot and passes through all three phases, starting in normal condition, transitioning 
into warning and ?nally reaching alarm state, initially giving some premature alarms 
but then correctly classi?ed as RUL 
U 
25. The fact that all engines give alarms, in case 
of doubt rather too early than not giving it at all, emphasizes the well-functioning of the 
warning system based on linear discriminant analysis. 
Fig. 8. Confusion matrix for classi?cation with linear discriminant analysis, subset 4. 
Implementing Clustering and Classi?cation Approaches 475
Classi?cation via Neural Network. The second method applied for classi?cation is a 
neural network. Following the destructive approach leads to a number of 20 neurons in 
1 hidden layer. 17 inputs, consisting of a reduced number of signals according to 
Sect. 4.2 and clustering results, are used. Figure 10 shows the neural network as 
modeled in MATLAB. 
Training this network to the point that a valid model is found takes 170 iterations 
on average. Using the same laptop as before this is equivalent to approximately 12 s. 
The classi?cation results obtained via the described neural network are summarized in 
the confusion matrix in Fig. 11. 
The sum of all diagonal elements is 74.1%, almost the same as with linear dis-criminant 
classi?cation. 25.9% of all data elements are still misclassi?ed. However, the 
three most severe misclassi?cations have values of 2.4, 0.0, and 0.1%, again almost 
identical to the results obtained via linear discriminant analysis, which are acceptably 
low. The neural network scores a slightly worse rate for classifying alarm conditions as 
such but is slightly better at correctly classifying warnings. 
Fig. 9. Displayed alarms, warnings and normal conditions when system is trained via linear 
discriminant analysis, subset 4. 
Fig. 10. Neural network modelled in MATLAB. 
476 K. Pitz and R. Anderl
Figure 12, when compared to Fig. 9, also highlights the fact that the warning 
system trained via neural network behaves almost identical to the one based on linear 
discriminant analysis. All engines display alarms before failure which is the preferred 
characteristic for the warning system discussed in this work. 
Fig. 11. Confusion matrix for classi?cation with neural network, subset 4. 
Fig. 12. Displayed alarms, warnings and normal conditions when system is trained via neural 
network, subset 4. 
Implementing Clustering and Classi?cation Approaches 477
6 Outlook 
The results presented in this paper offer various connecting points for further research. 
One promising next step may be to broaden the focus from clustering and classi?cation 
to also include regression. In the considered use case regression models could be used 
to estimate the remaining useful life of the engines. It should be examined to which 
extent regression models can pro?t from clustering and classi?cation results already 
obtained for the data set. 
Further enhancements could include image or video data to prove that the methods 
are also applicable for high variety data. In general, bigger data sets should be con-sidered 
for further validation. Integration of cloud solutions or distributed server 
structures should be tested. Applying the approaches to data sets from other technical 
systems could further prove their generalizability. 
7 Summary 
In this paper, a data set for applying Big Data approaches in a mechanical engineering 
scenario has been chosen. Various Big Data approaches have been assessed and 
compared. A problem de?nition of clustering and classi?cation has been formulated. 
For these two problems k-means clustering, linear discriminant analysis and neural 
networks have been identi?ed as adequate methods. 
All three methods have been implemented using the programming environment 
MATLAB 2017a. Above all, datastore objects, tall arrays and gather com-mands 
are crucial for enabling MATLAB scripts for Big Data. The code produced 
constitutes a basis for further extension. Bigger data sets could be processed spreading 
the computation among a greater number of cores with the help of MATLAB’s Parallel 
Computing Toolbox or involving computing clusters or cloud solutions via mapre-duce 
settings. Moreover, existing MATLAB scripts for any purposes can be adapted 
for Big Data use based on the insights gained by these examples. All that has to be 
considered is whether all functions that are used support tall arrays and whether the 
program sequence should be adjusted to minimize the number of gather commands. 
MATLAB proved to be an adequate tool for analyzing large amounts of stored data 
stemming from engine simulations. If it is still powerful enough when additional 
challenges like near real-time data or highly unstructured social media data arise 
remains to be proven. 
The results of the methods themselves show that k-means clustering with k-means+ 
+ initialization is very fast and effective in identifying operating condition and failure 
mechanism clusters in the engine data, reaching plausible results within two iterations. 
Comparing linear discriminant analysis and a feedforward neural network with one 
hidden layer shows a very similar performance for both when three de?ned classes for 
RUL values are the underlying scenario. Both reach approximately 74% of correct 
classi?cations and 2% or less for misclassi?cations considered especially severe. The 
neural network is easier to implement in MATLAB, more generalizable but less 
suitable whenever interpretation of results is a focus as well. The linear discriminant 
analysis proved to be slightly faster than the neural network. 
478 K. Pitz and R. Anderl
References 
1. Kitchin, R.: The Data Revolution. SAGE, Los Angeles (2014) 
2. Franks, B.: Taming the Big Data Tidal Wave. Wiley, Hoboken (2012) 
3. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety, https:// 
blogs.gartner.com/doug-laney/?les/2012/01/ad949-3D-Data-Management-Controlling-Data-
Volume-Velocity-and-Variety.pdf. Accessed 01 June 2018 
4. Demchenko, Y., Grosso, P., Laat, C., de Membrey, P.: Addressing Big Data issues in 
scienti?c data infrastructure. In: IEEE (ed.) 2013 International Conference on Collaboration 
Technologies and Systems (CTS) (2013) 
5. Long, C., Talbot, K., Gill, K. (eds.): Data Science & Big Data Analytics. Wiley, Indianapolis 
(2015) 
6. Simon, P.: Too Big to Ignore. Wiley, Hoboken (2013) 
7. Iafrate, F.: From Big Data to Smart Data. Wiley, Hoboken (2015) 
8. Aggarwal, C.C.: Data Mining. Springer, Cham (2015) 
9. Discroll, T.A.: Learning MATLAB. Society for Industrial and Applied Mathematics, 
Philadelphia (2009) 
10. NASA Prognostics Center of Excellence: PCoE Datasets. https://ti.arc.nasa.gov/tech/dash/ 
pcoe/prognostic-data-repository/. Accessed 06 Sept 2017 
11. Saxena, A., Goebel, K.: Turbofan Engine Degradation Simulation Data Set. https://ti.arc. 
nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository. Accessed 14 June 2018 
12. Kitchin, R.: Big Data, new epistemologies and paradigm shifts. SAGE J. Big Data Soc. 
(2014) 
13. Louridas, P., Ebert, C.: Machine learning. IEEE Softw. 33(5), 110–115 (2016) 
14. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645– 
678 (2005) 
15. Ester, M., Sander, J.: Knowledge Discovery in Databases. Springer, Berlin (2000) 
16. Shindler, M.: Approximation Algorithms for the Metric k-Median Problem. UCLA, Los 
Angeles (2008) 
17. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SIAM (ed.) 
SODA 2007: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete 
Algorithms, pp. 1027–1035, Philadelphia (2007) 
18. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. 
Springer, New York (2017) 
19. Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classi?cation. 
Springer, New York (2016) 
20. Genuer, R., Poggi, J.-M., Tuleau-Malot, C., Villa-Vialaneix, N.: Random forests for Big 
Data. In: Big Data Research, pp. 22–46 (2017) 
21. Schlittgen, R.: Multivariate Statistik. Oldenbourg, München (2009) 
22. The MathWorks Inc.: Create and Visualize Discriminant Analysis Classi?er. https://de. 
mathworks.com/help/stats/create-and-visualize-discriminant-analysis-classi?er.html. Acces-sed 
2018 Sep 2017 
23. Nielsen, M.: Using Neural Nets to Recognize Handwritten Digits. http://neuralnetwork 
sanddeeplearning.com/chap1.html. Accessed 27 Mar 2018 
24. The MathWorks Inc.: Tansig: Hyperbolic Tangent Sigmoid Transfer Function. https://de. 
mathworks.com/help/nnet/ref/tansig.html. Accessed 28 Mar 2018 
25. Russell, S., Norvig, P.: Künstliche Intelligenz, 3., aktualisierte. Pearson, München (2012) 
26. Kolen, J.F., Kremer, S.C. (eds.): A Field Guide to Dynamical Recurrent Networks. IEEE, 
New York (2001) 
Implementing Clustering and Classi?cation Approaches 479
27. Alpaydin, E.: Introduction to Maschine Learning. MIT Press, Cambridge (2004) 
28. The MathWorks Inc.: Tainml: Levenberg–Marquardt Backpropagation. https://de. 
mathworks.com/help/nnet/ref/trainlm.html. Accessed 27 Mar 2018 
29. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. 
Ind. Appl. Math. 11(2), 431–441 (1963) 
30. Hagan, M.T., Menhaj, M.: Training feed-forward networks with the Marquardt algorithm. 
IEEE Trans. Neural Netw. 5(6), 989–993 (1994) 
31. TIOBE: TIOBE Index for March 2018. https://www.tiobe.com/tiobe-index/. Accessed 21 
Mar 2018 
32. GitHut: Top Active Languages. http://githut.info/. Accessed 21 Mar 2018 
33. Ramasso, E., Saxena, A.: Performance benchmarking and analysis of prognostic methods for 
CMAPSS datasets. Int. J. Prognstics Health Manag. 5(2), 1–5 (2014) 
34. The MathWorks Inc.: Big Data Workflow Using Tall Arrays and Datastores. https://de. 
mathworks.com/help/distcomp/big-data-work?ow-using-tall-arrays-and-datastores.html. 
Accessed 27 Mar 2018 
480 K. Pitz and R. Anderl
Visualization Tool for JADE 
Platform (JEX) 
Halim Djerroud(B) 
and Arab Ali Cherif 
Universit´e Paris8, Laboratoire d’Informatique Avanc´ee de 
Saint-Denis (LIASD), 2 Rue de la libert´e, 93526 Saint-Denis, France 
{hdd,aa}@ai.univ-paris8.fr 
Abstract. This article presents JEX, a useful visualization extension to 
the JADE platform. JEX provides the possibility for MAS (Multi-agent 
systems) community using JADE to visualize and interpret their simula-tions 
developed under it. Why this contribution? Agent-based modeling 
is widely used to study complex systems. Therefore, several platforms 
have been developed to answer this need. However, in many platforms, 
the graphical representation of the environment and agents are not fully 
implemented. In the case of JADE, it’s completely inexistent. Implement-ing 
such a graphical representation within JADE is of interest because 
it’s a powerful multi-agent platform and FIPA compliant. Adding an 
extra feature like JEX will greatly help the scienti?c community and the 
industry to represent and interpret their MAS models. 
Keywords: Spatial simulation 
·
JADE 
·
Multi-agent systems 
1 Introduction 
Multi-agent systems (MAS) has become an active area of research. According to 
Weiss [1], a multi-agent systems (MAS) is de?ned as a system involving two or 
more agents to cooperate with each other while achieving local goals. Multi-agent 
systems are acknowledged as a suitable paradigm for modeling complex systems. 
They are applied in various domains such as collaborative decision support sys-tems 
and robotics. The software development process of MAS requires robust 
platforms to address the complexity of these tasks by o?ering MAS key features 
such as agent development, monitoring and analysis. The development e?ciency 
can be signi?cantly enhanced using a platform able to do speci?c representation. 
Agent-based models [2] is the discipline aimed at understanding interaction 
of agents in their environment. The multi-agent system are used in two cases: (a) 
Simulation of complex phenomena [3] witch implies the simulation of interactions 
between agents. This simulation is meant to de?ne the system’s evolution in order 
to predict its future organization, such as the food chain study. (b) Distributed 
problems solving [1] such as the study of virus propagation in computer networks. 
The study of complex phenomena often involves entities that evolve in space 
and time. Implementation of these systems in an MAS requires the representation 
.h
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 481–489, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_36
482 H. Djerroud and A. A. Cherif 
of the environment in which they evolve and integrate their positions within it? 
We refer this kind of simulation as “Agent-based spatial simulation”. 
Agent-based spatial simulation is a key tool and system complex study [4]. It 
has grown considerably among the scienti?c community, and within many social 
science disciplines such as psychology in case of simulating human behavior 
during emergency evacuations [5]. 
Since the modeling method used to represent systems varies according to its 
characteristics, it is essential to represent the environment as well as the agents 
developed in it. For this purpose, several multi-agent platforms have chosen to 
integrate a graphical interface that makes possible direct visualization of the 
agents’ interaction and the development of the environment. 
JADE1 is one of the most popular multi-user platforms [6]. This platform 
is widely used in research works because it implements the FIPA standard [7]. 
Thus, it becomes consequently easily interoperable with other platforms that 
implement the same standard like ZEUS [17], FIPA-OS [18], LEAP [19] and 
JACK [20]. Furthermore, we can state that JADE is particularly well docu-mented 
[8] and has a proven an impressive track record of the system that has 
been develops with it. 
Using JADE, we faced one main issue as JADE does not implement one 
key feature which is a spatial representation module. Thus, dedicated and rigid 
plateforms like GAMA [9] and Netlogo [12] seems to be more appealing as they 
o?ers natively this functionality. JEX (JADE Environment Extension) has been 
developed in order to address the lack of this key module in JADE. JEX is 
a spacial representation module and technically a Java library that integrates 
easily with JADE. 
This article is structured as follows: First it presents a state of the art of sev-eral 
Multi Agent System architectures that illustrate the interest of our contri-bution. 
Second describes the JEX extension as well as all provided possibilities. 
Finally, it compares our contribution with the tools presented in the Related 
Work section. We conclude the article with a discussion about the perspectives 
of our contribution. 
2 Related Work 
More and more applications are developed using MAS, but there are few multi-agent 
oriented implementation tools and powerful agent programming languages. 
MAS Design relies on existing languages and programming techniques and it’s 
often hard to develop MAS (implementation, distribution, communications). The 
trend in this context takes on Multi-Agent Oriented Programming and meaning 
programming MAS with MAS tools. Many standards have been developed in 
this regard such as FIPA2 , MASIF3 and DARPA4 . In this section, we introduce 
1 
JADE: An open source tool available in: http://jade.tilab.com/. 
2 
IEEE FIPA:Foundation for Intelligent Physical Agent. 
3 
MASIF-OMG (Object Management Group) : OMG e?ort to standardize mobile 
agents - middleware services and internal middleware interfaces. 
4 
Knowledge Sharing E?ort The DARPA Knowledge Sharing E?ort.
Visualization Tool for JADE Platform (JEX) 483 
and compare some agent platform such as JADE, NetLogo, GAMA and Mason 
[10,11]. 
JADE [6] (Java Agent Development Framework) is one of the most pop-ular 
agent technology platform. JADE has become a major open source soft-ware 
project with a worldwide scope. It is an agent-oriented middleware that 
facilitates the development of multi-agent systems. It’s FIPA compliant, FIPA 
being an IEEE Foundation for Intelligent Physical Agents. JADE is developed 
in JAVA. It includes a runtime environment with JADE agents, on which one 
or more agents can be run from the host; a class library that programmers 
must/can use to develop their agents; a suite of graphical tools that allow the 
administration and monitoring of the activity of agents during implementation. 
However, JADE has no tools to visualize agents and the environment. 
NetLogo [12] is a multi-agent environment focused on [13,14], modeling 
tools. It integrates its own programming language that can be described as a 
high-level language. The environment is discrete and it is represented in 2D or 
3D form depending on the version used. Netlogo represents the agents that are 
obligatory in the environment and can not communicate with the environment 
alone. Under Netlogo, it’s possible to depict a third type of facility referred to as 
links. It connects up two agents and symbolizes the relationship between agents. 
Gama [9] The GAMA platform (Gis & Agent-based Modelling Architec-ture) 
is like Netlogo, it o?ers a complete modelling language - GAMA (Gama 
Modelling Language) - allowing modellers to build models quickly and easily. 
However, unlike Netlogo which is limited to the construction of simple models, 
GAMA allows the construction of very complex models, as rich as those built by 
a computer scientist from tools such as Repast Simphony. In particular, GAMA 
o?ers very advanced tools for space management. 
Mason [15] MASON is a fast, discrete, Java-based, multi-agent simulation 
library designed to serve as a foundation for large customized Java simulations, 
and to provide su?cient utility for several soft simulation needs. MASON con-tains 
both a model library and an optional suite of 2D and 3D visualization 
tools. 
3 JEX Architecture 
JEX is an extension visualization tool for JADE Framework, this section present 
JEX general architecture. The main goal is to provide JADE with an easy and 
e?ective viewer module like the NetLogo interface, therefore, JEX is inspired by 
NetLogo visual architecture and functionalities. 
To provide a visual representation of MAS, we need to represent agents, 
patches and links. Agents can act on the environment, to simplify the complex 
implementation of the environment, they are decomposed into small parts called 
Patches. The Links are relations between agents. 
For JEX we propose the following architecture: We consider the tree types of 
entities mentioned above: Agents, Links and Environment. The tree entities has 
been implemented as classes named JADE Agents and are named JexAgent,
484 H. Djerroud and A. A. Cherif 
Fig. 1. JEX architecture, UML class diagram. 
JexLink and JexEnvironnement, as illustrated in Fig. 1. These three classes 
are derived from JexGenircAgent which is simply a jade Agent superclass. 
We have chosen this implementation in order to take full advantage o?ered 
by the JADEagent Superclass functionalities and be fully compatible with the 
framework. 
– JexEnvironnement Class consist of a set of patches. The user can choose 
the environment dimension and global characteristics (patches size, word-warps5 
, colors etc). The dimensions of the patches can also be chosen. Each 
element (patches) can be manipulated independently. From a technical point 
of view JexEnvironnement is a static class, with static members. To avoid 
multi-instances of environments, and ease agent access. Other global char-acteristics 
have been added, such as: the posting6 ) delay, the origin position 
(position (0,0)) of the environment and other parameters that fully listed in 
JEX documentation7 . 
– JexAgent and Jexlinks are used by JexObserver, that can be consider 
as an agent used as a registration point the agents willing to subscribe to 
graphics’s representation module. JexObserver provides other services such 
as creating links (Links) and initializing the environment and proposes Jex-
AgentObserver Interface for the Agents wishing to use the graphics’ rep-resentation 
functionalities. 
We insist that these various actions are completely transparent to the user, 
and they are performed automatically. In the next section we describe how to 
integrate JEX into a JADE project. 
5 
Connect the edges of the environment. 
6 
Step by step execution, or time unit of execution. 
7 
http://djerroud.halim.info/index.php/jex.
Visualization Tool for JADE Platform (JEX) 485 
4 Integration to JADE 
JEX (JADE Environment Extension) comes in the form of a jex.jar java library. 
This library makes it possible to provide JADE with a graphical environment 
which makes it possible to visualize the agents and the environment. 
The integration of JEX into a JADE project, does not require any mod-i?cations 
of the JADE project. It needs only creating a JexObserver type 
agent. This agent enables it to con?gure the environment, e.g., the length and 
the width of the environment, the refreshment time and soon. If none of these 
parameters are speci?ed, the values will be set by default. In the next section of 
the code, we will present how the JexObserver agent is created. 
We observe in the selection of the code that follows, that the creation of the 
JexObserver agent is done in the same way as the creation of a JADE agent. 
This is possible because JEX agents, as indicated in the previous section, are 
JADE agents, more precisely they are derived from the JADE Agent class. 
import jade . c or e . Agent ; 
import j e x . JexEnvironnement2D ; 
import j e x . JexObserver ; 
p u b l i c c l a s s JexTesterAgent extends 
Agent{ 
protected void setup ( ) 
{ 
JexEnvironnement2D . i n i t 2 D ( ) ; 
Object a r g s [ ] = new Object [ 1 ] ; 
args [ 0 ] = ” ” ; 
ContainerController cc = 
getContainerController ( ) ; 
try 
{ 
AgentController ac = 
cc . createNewAgent (” JexObs ” , 
” j e x . JexObserver ” , args ) ; 
ac . s t a r t ( ) ; 
. . . . 
} 
catch ( Exception e ) 
{ 
... } 
} 
} 
In order to maintain the ?exibility of JADE, the JEX library does not 
monitor all the agents systematically. It’s up to the user to choose the agents to 
observe. In order to monitor an agent, the agent needs only to register within 
the JexObserver agent as shown in the following code: 
import jade . c or e . Agent ; 
import j e x . JexAgent ; 
. . . 
p u b l i c c l a s s AgentToObserve extends 
Agent{ 
. . . 
protected void setup ( ) 
{ 
jexObserver . s u b s c r i b e (
486 H. Djerroud and A. A. Cherif 
t h i s . getLocalName ( )); 
. . . 
addBehaviour ( . . . ) 
{ 
. . . 
} ); 
}} 
Once the observer agent JexObserver is created, and the agents wishing to 
bene?t from JEX have registered with the observer, it remains to animate these 
agents only. For that, JEX o?ers a set of functions that allow the manipulation 
of the various agents in the environment. Among the functions that JEX o?er, 
we ?nd the initialization functions, that give the initial position of the agent in 
the environment. This position can be de?ned by the user or let JEX propose 
a random position. 
Another set of functions give a shape to the agents. This is de?ned by a basic 
geometrical shape, e.g., square, rectangle or circle, etc., according the speci?c 
form de?ned by the user via an image ?le. 
Finally, there is the set of functions that allow the movement itself. These 
functions can directly indicate a position to converge to, or give orientation and 
movement. Other functions specify the color of the agents, the text on display 
etc. All of these functions are described in the JEX documentation. 
The selection of the code bellow gives an example of an implementation of 
an agent that performs initialization and basic movements. 
. . . 
JexAgent jexAgent= jexObserver . 
getJexAgent ( t h i s . getLocalName ()); 
. . . 
jexAgent . setRadius ( 1 0 ) ; 
jexAgent . setShape ( jexAgent .CERCLE) ; 
jexAgent . s e t C o l o r ( new JexColor ( 2 0 0 , 0 , 0 ) ) ; 
jexAgent . s e t I n i t P o s ( 5 0 , 5 0 ) ; 
. . . 
addBehaviour ( new . . . Behaviour ( . . . ) 
{ 
protected void onTick ( ) 
{ 
jexAgent . setHeading (270); 
jexAgent . forward ( 5 ) ; 
} 
} ) ; . . . 
As indicated in the previous section, JEX allows the addition of links between 
agents, these links (Links) are represented in the graphical environment by the 
lines that connect the agents to each other. These links are particularly useful 
when representing graphs. The following code gives an account of how to add 
these links (Links) in JEX.
Visualization Tool for JADE Platform (JEX) 487 
. . . 
jexObserver . addLink ( 
jexAgent . getJexAgentLocalName ( ) , 
” agent attached ” , f a l s e 
); 
. . . 
We end this section with a graphic illustration (Fig. 2). We have chosen an 
example to illustrate the possibilities of JADE associated with JEX, namely, 
an implementation of a simulation for the propagation of viruses in a computer 
network. 
The model, displayed in Fig. 2, shows the spread of a virus through a network. 
Although the model is somewhat abstract, the interpretation is the following: 
each node represents a computer, and the modeling represents the progression 
of a computer virus through this network. Each node has two states: infected or 
not. In academia, such a model is sometimes called the SIR model. 
The Blue nodes represent the uninfected machines. The links that exist 
between these machines are ?gured as lines connecting to the nodes. The red 
nodes represent the infected machines. 
Fig. 2. Computer network, spread of viruses.
488 H. Djerroud and A. A. Cherif 
5 Discussion 
The existing multi-agent platforms are more or less specialized, we use again the 
example of NetLogo that makes possible to accomplish feats in terms of visual 
rendering and spatial representation of the agents. However this tool is very little 
used in the scienti?c world, because of its lack of robustness and speci?city of 
language, that reduce the working possibilities. 
JADE, is written in Java and is easy to use. It implements the FIPA protocol 
which makes it one of the best multi-agent platforms. However, it does not o?er 
a graphical environment for the spatial representation of agents. Attempts to 
combine the two platforms have already been tested [16]. The communication 
between the two systems is possible via the exchange of XML ?les. 
Spatial representation is essential for the study of complex phenomena as 
we have shown in Sect. 1. The utility of integrating a spatial representation tool 
for the powerful JADE tool is an important contribution. We described in this 
article how to provide JADE with the same graphic means such as NetLogo, 
which inspired us in our work. 
For the future of JEX, we have developed tools for 2D representation, and 
we plan to add a 3D representation of the environment as well as to improve the 
API that we presented. 
We share this work using a free license; the whole source code as well as the 
jar ?le and the documentation can be downloaded from the link8 . 
6 Conclusion 
In this paper, we have proposed JEX a spatial representation of MAS agents as 
an extension of JADE Framework. We discussed its algorithms and more impor-tantly 
its e?ectiveness and complementary contribution to JADE. We suppose 
that this easily integrated enhancement will be very bene?cial to JADE’s devel-oper 
community. 
References 
1. Weiss, G.: Multiagent Systems: A Modern Approach to Distributed Arti?cial Intel-ligence. 
MIT Press, Cambridge (1999) 
2. Vidal, J.M., Buhler, P., Goradia, H.: The past and future of multiagent systems. 
In: AAMAS Workshop on Teaching Multi-agent Systems (2004) 
3. Amigoni, F., Schia?onati, V.: A multiagent approach to modelling complex phe-nomena. 
Found. Sci. 13(2), 113–125 (2008) 
4. Macal, C.M., North, M.J.: Agent-based modeling and simulation: ABMS examples. 
In: Simulation Conference, Winter WSC 2008, p. 2008. IEEE (2008) 
5. Pan, X., et al.: A multi-agent based framework for the simulation of human and 
social behaviors during emergency evacuations. Ai Society 22(2), 113–132 (2007) 
8 
http://djerroud.halim.info/index.php/jex.
Visualization Tool for JADE Platform (JEX) 489 
6. Bellifemine, F., Agostino, P., Giovanni, R.: JADE-A FIPA-compliant agent frame-work. 
In: Proceedings of PAAM, vol. 99, pp. 97–108 (1999) 
7. O’Brien, P.D., Nicol, R.C.: FIPA-towards a standard for software agents. BT Tech-nol. 
J. 16(3), 51–59 (1998) 
8. Bellifemine, F.L., Giovanni, C., Dominic, G.: Developing Multi-agent Systems with 
JADE, vol. 7. Wiley (2007) 
9. Taillandier, P., et al.: GAMA: a simulation platform that integrates geographical 
information data, agent-based modeling and multi-scale control. In: International 
Conference on Principles and Practice of Multi-Agent Systems. Springer, Heidel-berg 
(2010) 
10. Nguyen, G., et al.: Agent platform evaluation and comparison. Rapport technique, 
Institute of Informatics, Bratislava, Slovakia (2002) 
11. Trillo, R., Sergio, I., Eduardo, M.: Comparison and performance evaluation of 
mobile agent platforms. In: Third International Conference on Autonomic and 
Autonomous Systems ICAS 2007. IEEE (2007) 
12. Tisue, S., Uri, W.: Netlogo: a simple environment for modeling complexity. In: 
International Conference on Complex systems, vol. 21 (2004) 
13. Tisue, S., Uri, W.: NetLogo: design and implementation of a multi-agent modeling 
environment. Proc. Agent (2004) 
14. Kornhauser, D., Rand, W., Wilensky, U.: Visualization tools for agent-based mod-eling 
in NetLogo. Proc. Agent, 15–17 (2007) 
15. Luke, S., et al.: Mason: a multiagent simulation environment. Simulation 81(7), 
517–527 (2005) 
16. Reis, J.C., Rosaldo, J.F.R., Gil, G.: Towards NetLogo and JADE Integration: an 
industrial agent-in-the-loop approach 
17. Nwana, H.S., Ndumu, D.T., Lee, L.C.: ZEUS: an advanced tool-kit for engineering 
distributed multi-agent systems. In: Proceedings of PAAM, vol. 98 (1998) 
18. Poslad, S., Phil, B., Rob, H.: The FIPA-OS agent platform: open source for open 
standards. In: Proceedings of the 5th International Conference and Exhibition on 
the Practical Application of Intelligent Agents and Multi-Agent, vol. 355 (2000) 
19. Bergenti, F., Poggi, A.: Leap: a FIPA platform for handheld and mobile devices. 
In: International Workshop on Agent Theories. Architectures and Languages. 
Springer, Heidelberg (2001) 
20. Winiko?, M.: JACK™ intelligent agents: an industrial strength platform. In: Multi-
Agent Programming, pp. 175–193. Springer, Boston (2005)
Decision Tree-Based Approach for Defect 
Detection and Classi?cation in Oil and Gas 
Pipelines 
Abduljalil Mohamed1(&) , Mohamed Salah Hamdi1 , 
and So?ene Tahar2 
1 
Information Systems Department, Ahmed Bin Mohamed Military College, 
Doha, Qatar 
{ajmaoham,mshamdi}@abmmc.edu.qa 
2 
Electrical and Computer Engineering Department, Concordia University, 
Montreal, Canada 
tahar@ece.concordia.ca 
Abstract. Metallic pipelines are used to transfer crude oil and natural gas. 
These pipelines extend for hundreds of kilometers, and as such, they are very 
vulnerable to physical defects such as dents, cracks, corrosion, etc. These 
defects may lead to catastrophic consequences if not managed properly. Thus, 
monitoring these pipelines is an important step in the maintenance process to 
keep them up and running. During the monitoring stage, two critical tasks are 
carried out: defect detection and defect classi?cation. The ?rst task concerns 
with the determination of the occurrence of a defect in the monitored pipeline. 
The second task concerns with classifying the detected defect as a serious or 
tolerable defect. In order to accomplish these tasks, maintenance engineers 
utilize Magnetic Flux Leakage (MFL) data obtained from a large number of 
magnetic sensors. However, the complexity and amount of MFL data make the 
detection and classi?cation of pipelines defects a dif?cult task. In this study, we 
propose a decision tree–based approach as a viable monitoring tool for the oil 
and gas pipelines. 
Keywords: Defect detection and classi?cation
.n
Decision tree 
Data mining
.n
Pipeline monitoring and maintenance 
1 Introduction 
Oil and gas pipeline defect monitoring is an essential component of the pipeline 
maintenance process. In order to maintain the pipeline in a properly working order, 
different inspection tools such as magnetic flux leakage (MFL), ultrasonic waves, and 
closed circuit television (CCTV) are used to detect and classify pipeline defects [1–3]. 
The complexity and amount of data obtained by such diverse tools require the use of 
sophisticated defect detection and classi?cation techniques. Most of the approaches 
reported in the literature [4] have been proposed for the purpose of either prediction of 
defect dimensions, detection of defects, or classi?cation of defect types. To achieve 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 490–504, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_37
these objectives, techniques such as machine learning [5–7], wavelets [8–13], and 
signal processing [14–16] are widely used. 
The focus of this paper, however, is on developing a pipeline monitoring tool that 
incorporates the two tasks namely: defect detection and defect classi?cation. The main 
inference engine for both tasks is a decision tree that takes as an input the crucial MFL 
depth and length parameters. 
2 Pipeline Monitoring 
In this paper, we propose a new monitoring approach for oil and gas pipelines. The 
general structure of the proposed approach is shown in Fig. 1. 
MFL Signals. MFL data are collected from autonomous devices known as intelligent 
pigs. An increase in flux leakage may indicate metal loss, which in turn, means the 
possibility of defect occurrence. Thus, at the location of the potential defect, the depth 
and length of the flux leakage are measured or estimated by using arti?cial neural 
networks. 
Defect Detection. These two crucial MFL parameters are ?rst entered into the defect 
detection unit. A decision tree is realized in this unit as defect detection technique. If no 
defect is detected, the monitoring process terminates. On the other hand, if a pipeline 
defect is detected, the two parameters will be passed on to the classi?cation unit. 
Defect Classi?cation. In this unit, based on their severity level, the defect is classi?ed 
into one of two categorizes: Type I or Type II. In this work, Type I is considered a very 
serious pipeline defect which requires an immediate action and reparation. Type II is 
considered less serious and can wait and be scheduled for defect maintenance. 
Fig. 1. The proposed monitoring approach for the oil and gas pipelines. 
Decision Tree-Based Approach for Defect Detection 491
3 Decision Tree-Based Approach for Defect Detection 
and Classi?cation 
The decision tree utilized in this work is derived from the simple divide-and-conquer 
algorithm. The decision tree is expressed recursively as described in the following 
sections. 
MFL Signal Depth and Length Attributes. In order to detect/classify pipeline 
defects, the obtained MFL signals are ?rst normalized and mapped into depth and 
length ranges. According to the industry standard [17], the depth range for the MFL 
signals is normalized between 0 and 1; and the length range for the MFL signals is 
normalized between 0 and 6. These two ranges constitute the MFL attributes, and are 
divided into different values as described below. 
The MFL depth attribute values are: 
Very high = [0.80 1.00], 
High = [0.60 0.79], 
Medium = [0.40 0.59], 
Low = [0.20 0.39], 
Very low = [0.00 0.19], 
The MFL length attribute values are: 
Large = [3.81 6.00], 
Medium = [1.81 3.80], 
Small = [0.61 1.80], 
Very small = [0.00 0.60], 
Defect Detection. Based on the information given in [17], the MFL attributes can now 
be used to identify the status of the MFL signals as shown in Table 1. The MFL signal 
can either be identi?ed as abnormal (defect) or normal. 
Constructing Decision Tree. To construct the decision tree for the defect detection, an 
attribute is ?rst selected and placed at the root node, and make branch for each possible 
value. This splits up the MFL signals into subsets, one for every value of the attribute. 
The process is repeated recursively for each branch, using only those instances that 
actually reach the branch. If all instances at a particular node are all either abnormal or 
normal, then we stop developing that part of the tree. There are two possibilities for 
each split; and they produce two trees as shown in Figs. 2 and 3 for the depth and 
length attributes, respectively. 
The number of 2 (abnormal) and 1 (normal) classes is shown at the leaves. Any leaf 
with only one class (i.e., 2 or 1) reaches the ?nal split; and thus the recursive process 
terminates. In order to reduce the size of the trees, the information gain for each node is 
measured. Now the information for the two attributes is calculated and split is made on 
the one that gains the most information. 
Tree Structure. The informational value of creating a branch on the MFL-depth 
attribute and the MFL-length attribute are then calculated. The number of normal and 
abnormal at the leaf nodes in Fig. 2 are [0 4], [1 3], [2 2], [2 2], and [4 0], respectively. 
492 A. Mohamed et al.
The number of normal and abnormal at the leaf nodes in Fig. 3 are [4 1], [3 2], [1 4], 
and [1 4], respectively. 
Calculating the information gain for each attribute yields the tree structure shown in 
Fig. 4. As described in Fig. 5, the decision tree basically uses three values of the MFL-depth 
attribute and four values of the MFL-length attribute. The values are Low, 
Medium, and High for the MFL-depth attribute, and Very Small, Small, Medium, and 
Large for the MFL-length attribute. 
Table 1. MFL signal abnormal and normal status based on its depth and length range 
MFL-depth MFL-length Status 
Very 
high 
High Medium Low Very 
low 
Very 
small 
Small Medium Large Normal 
(1) 
Abnormal 
(2) 
YES NO NO NO NO YES NO NO NO NO YES 
YES NO NO NO NO NO YES NO NO NO YES 
YES NO NO NO NO NO NO YES NO NO YES 
YES NO NO NO NO NO NO NO YES NO YES 
NO YES NO NO NO YES NO NO NO YES NO 
NO YES NO NO NO NO YES NO NO NO YES 
NO YES NO NO NO NO NO YES NO NO YES 
NO YES NO NO NO NO NO NO YES NO YES 
NO NO YES NO NO YES NO NO NO YES NO 
NO NO YES NO NO NO YES NO NO YES NO 
NO NO YES NO NO NO NO YES NO NO YES 
NO NO YES NO NO NO NO NO YES NO YES 
NO NO NO YES NO YES NO NO NO YES NO 
NO NO NO YES NO NO YES NO NO YES NO 
NO NO NO YES NO NO NO YES NO NO YES 
NO NO NO YES NO NO NO NO YES NO YES 
NO NO NO NO YES YES NO NO NO YES NO 
NO NO NO NO YES NO YES NO NO YES NO 
NO NO NO NO YES NO NO YES NO YES NO 
NO NO NO NO YES NO NO NO YES YES NO 
Fig. 2. The decision tree for the MFL depth attribute. The abnormal status is referred to by 2; 
while the normal status is referred to by 1. 
Decision Tree-Based Approach for Defect Detection 493
Defect Classi?cation. The MFL data used for classifying the defect severity level is 
shown in Table 2. The table shows that the two attribute values can indicate either the 
defect level is of Type I, or the defect level is of Type II. 
Fig. 3. The decision tree for the MFL length attribute. The abnormal status is referred to by 2; 
while the normal status is referred to by 1. 
Fig. 4. The decision tree structure for the defect detection. 
494 A. Mohamed et al.
Constructing Decision Tree. The two trees produced by the two attributes are shown in 
Figs. 6 and 7. As was the case for the defect decision tree, the information gain for each 
node is measured, and split is made on the one that gains the most information. 
Tree Structure. The informational value of creating a branch on the MFL-depth 
attribute and the MFL-length attribute are then calculated. The number of defect Type I 
and Type II at the leaf nodes in Fig. 6 are [2 1], [1 2], and [0 3], respectively. The 
number of defect type I and type II at the leaf nodes in Fig. 7 are [0 3], [1 2], and [2 1], 
respectively. 
Calculating the information gain for each attribute yields the tree structure shown in 
Fig. 8. As described in Fig. 9, the decision tree basically uses three values of the MFL-depth 
attribute and three values of the MFL-length attribute. The values are Low, 
Medium, and High for the MFL-depth attribute, and Small, Medium, and Large for the 
MFL-length attribute. 
Fig. 5. The defect detection based on the two MFL attributes. 
Table 2. MFL signal defect (i.e., Type I, Type II) status based on its depth and length range 
MFL-depth MFL-length Defect 
High Medium Low Small Medium Large Type I (1) Type II (2) 
YES NO NO YES NO NO NO YES 
YES NO NO NO YES NO YES NO 
YES NO NO NO NO YES YES NO 
NO YES NO YES NO NO NO YES 
NO YES NO NO YES NO NO YES 
NO YES NO NO NO YES YES NO 
NO NO YES YES NO NO NO YES 
NO NO YES NO YES NO NO YES 
NO NO YES NO NO YES NO YES 
Decision Tree-Based Approach for Defect Detection 495
Fig. 6. The decision tree for the MFL-depth attribute. The defect status of type I is referred to by 
1; while type II is referred to by 2. 
Fig. 7. The decision tree for the MFL-length attribute. The defect status of type I is referred to 
by 1; while type II is referred to by 2. 
Fig. 8. The decision tree structure for the defect classi?cation. 
496 A. Mohamed et al.
4 Performance Evaluation 
The performance of the proposed approach is measured by two important criteria: the 
receiver operating characteristics (ROC) curves and the confusion matrices. In ROC, 
the true positive rates (sensitivity) are plotted against the false positive rates (1- 
speci?city) for different cut-off points. For a speci?c severity class, the closer its ROC 
curve is to the left upper corner of the graph, the higher its classi?cation accuracy is. In 
the confusion matrix plot, the rows correspond to the predicted class (output class), and 
the columns show the true class (target class). In the defect detection and classi?cation, 
the proposed approach is compared with the four well-known classi?ers, namely the 
Naive Bayesian (NB) classi?er, k-nearest neighbor (KNN) classi?er, Arti?cial Neural 
Network (ANN) classi?er, and the Support Vector Machine (SVM) classi?er. 
Data. The available MFL dataset used in the experimental work is categorized as 
follows. For the defect detection, there are 907 samples of normal status, and 2721 
samples of the abnormal status. For the defect classi?cation, there are 907 samples for 
each type of defects. The data samples have been further divided as follows: 70% for 
training, 15% for validation, and 15% for testing. 
Defect Detection. The confusion matrix and the ROC curves for each detector model 
are shown in Figs. 10, 11, 12, 13 and 14 for the models NB, KNN, ANN, SVM, and 
the proposed decision tree (DT). In these ?gures, the normal status of the MFL signal is 
referred to by Class 1, and abnormal status is referred to by Class 2. 
Fig. 9. The defect classi?cation based on the two MFL attributes. 
Decision Tree-Based Approach for Defect Detection 497
Defect Classi?cation. The confusion matrix and the ROC curves for each classi?er 
model are shown in Figs. 15, 16, 17, 18 and 19 for the models NB, KNN, ANN, SVM, 
and the proposed decision tree (DT). In these ?gures, the defect type is referred to by 
Class 1, and defect Type II is referred to by Class 2. 
Fig. 10. The defect detection confusion matrix (a) and ROC curves (b) for the NB model. 
Fig. 11. The defect detection confusion matrix (a) and ROC curves (b) for the KNN model. 
498 A. Mohamed et al.
It should be noted from these ?gures that the proposed DT model outperforms all 
other models. It yields 99.2% accuracy for the detection and classi?cation. Moreover, 
the arti?cial neural network model yields the worst performance at 70.2% detection 
accuracy and 71.4% classi?cation accuracy. The defect detection and classi?cation 
performance of all models are summarized in Table 3. 
Fig. 12. The defect detection confusion matrix (a) and ROC curves (b) for the ANN model. 
Fig. 13. The defect detection confusion matrix (a) and ROC curves (b) for the SVM model. 
Decision Tree-Based Approach for Defect Detection 499
Fig. 14. The defect detection confusion matrix (a) and ROC curves (b) for the DT model. 
Fig. 15. The defect classi?cation confusion matrix (a) and ROC curves (b) for the NB model. 
500 A. Mohamed et al.
Fig. 16. The confusion matrix (a) and ROC curves (b) for the KNN model. 
Fig. 17. The defect classi?cation confusion matrix (a) and ROC curves (b) for the ANN model. 
Decision Tree-Based Approach for Defect Detection 501
Fig. 18. The defect classi?cation confusion matrix (a) and ROC curves (b) for the SVM model. 
Fig. 19. The defect classi?cation confusion matrix (a) and ROC curves (b) for the DT model. 
Table 3. Detection and classi?cation accuracy for the NB, KNN, ANN, SVM, and DT models. 
Classi?er model Defect 
Detection Classi?cation 
NB 87% 83.8% 
KNN 98.8% 96.8% 
ANN 70.2 71.4% 
SVM 89.5% 90% 
DT 99.2% 99.2% 
502 A. Mohamed et al.
5 Conclusion 
The monitoring process for the oil and gas pipelines consists of two main tasks: defect 
detection and defect classi?cation. The complexity and amount of the MFL monitoring 
data make both tasks very dif?cult. In this work, we proposed a decision tree-based 
approach as a viable monitoring tool. The new approach is evaluated using two 
important criteria: the receiver operating characteristics (ROC) curves and the confu-sion 
matrices. The performance of the new approach is compared with other well-known 
monitoring tools. Extensive experimental work has been carried out and the 
performance of the proposed approach along with four other well-known techniques 
are reported. The new approach outperforms all of them with accuracy at 99.2% for the 
detection and classi?cation tasks. 
Acknowledgment. This work was made possible by NPRP Grant # [5-813-1-134] from Qatar 
Research Fund (a member of Qatar Foundation). The ?ndings achieved herein are solely the 
responsibility of the authors. 
References 
1. Park, G.S., Park, E.S.: Improvement of the sensor system in magnetic flux leakage-type nod-destructive 
testing. IEEE Trans. Magn. 38(2), 1277–1280 (2002) 
2. Jiao, J., et al.: Application of ultrasonic guided waves in pipe’s NDT. J. Exp. Mech. 1, 000 
(2002) 
3. Jiao, J., et al.: Application of ultrasonic guided waves in pipe’s NDT. J. Exp. Mech. 17(1), 
1–9 (2002) 
4. Layouni, M, Tahar, S., Hamdi, M.S.: A survey on the application of neural networks in the 
safety assessment oil and gas pipelines. In: 2014 IEEE Symposium on Computational 
Intelligence for Engineering Solutions. IEEE (2014) 
5. Khodayari-Rostamabad, A., et al.: Machine learning techniques for the analysis of magnetic 
flux leakage images in pipeline inspection. IEEE Trans. Magn. 45(8), 3073–3084 (2009) 
6. Lijian, Y., et al.: Oil-gas pipeline magnetic flux leakage testing defect reconstruction based 
on support vector machine. In: Second International Conference on Intelligent Computation 
Technology and Automation, ICICTA 2009, vol. 2. IEEE (2009) 
7. Vidal-Calleja, T., et al.: Automatic detection and veri?cation of pipeline construction 
features with multi-modal data. In: 2014 IEEE/RSJ International Conference on Intelligent 
Robots and Systems (IROS 2014). IEEE (2014) 
8. Song, S., Que, P.: Wavelet based noise suppression technique and its application to 
ultrasonic flaw detection. Ultrasonics 44(2), 188–193 (2006) 
9. Hwang, K., et al.: Characterization of gas pipeline inspection signals using wavelet basis 
function neural networks. NDT E Int. 33(8), 531–545 (2000) 
10. Mukhopadhyay, S., Srivastava, G.P.: Characterisation of metal loss defects from magnetic 
flux leakage signals with discrete wavelet transform. NDT E Int. 33(1), 57–65 (2000) 
11. Han, W., Que, P.: A modi?ed wavelet transform domain adaptive FIR ?ltering algorithm for 
removing the SPN in the MFL data. Measurement 39(7), 621–627 (2006) 
12. Joshi, A., et al.: Adaptive wavelets for characterizing magnetic flux leakage signals from 
pipeline inspection. IEEE Trans. Magn. 42(10), 3168–3170 (2006) 
Decision Tree-Based Approach for Defect Detection 503
13. Qi, S., Liu, J., Jia, G.: Study of submarine pipeline corrosion based on ultrasonic detection 
and wavelet analysis. In: 2010 International Conference on Computer Application and 
System Modeling (ICCASM), vol. 12. IEEE (2010) 
14. Afzal, M., Udpa, S.: Advanced signal processing of magnetic flux leakage data obtained 
from seamless gas pipeline. NDT E Int. 35(7), 449–457 (2002) 
15. Guoguang, Z., Penghui, L.: Signal processing technology of circumferential magnetic flux 
leakage inspection in pipeline. In: 2011 Third International Conference on Measuring 
Technology and Mechatronics Automation (ICMTMA), vol. 3. IEEE (2011) 
16. Kandroodi, M.R., et al.: Defect detection and width estimation in natural gas pipelines using 
MFL signals. In: 2013 9th Asian Control Conference (ASCC). IEEE (2013) 
17. Cosham, A., Hopkins, P., Macdonald, K.A.: Best practice for the assessment of defects in 
pipelines—corrosion. Eng. Fail. Anal. 14(7), 1245–1265 (2007) 
504 A. Mohamed et al.
Impact of Context on Keyword Identi?cation 
and Use in Biomedical Literature Mining 
Venu G. Dasigi1(?) , Orlando Karam2 , and Sailaja Pydimarri3 
1 
Bowling Green State University, Bowling Green, OH, USA 
vdasigi@bgsu.edu 
2 
Kennesaw State University, Marietta, GA, USA 
orlando.karam@gmail.com 
3 
Life University, Marietta, GA, USA 
sailaja.pydimarri@life.edu 
Abstract. The use of two statistical metrics in automatically identifying 
important keywords associated with a concept such as a gene by mining scien- 
tific literature is reviewed. Starting with a subset of MEDLINE® abstracts that 
contain the name or synonyms of a gene in their titles, the aforementioned 
metrics contrast the prevalence of specific words in these documents against 
a broader “background set” of abstracts. If a word occurs substantially more 
often in the document subset associated with a gene than in the background 
set that acts as a reference, then the word is viewed as capturing some specific 
attribute of the gene. 
The keywords thus automatically identi?ed may be used as gene features in 
clustering algorithms. Since the background set is the reference against which 
keyword prevalence is contrasted, the authors hypothesize that di?erent back- 
ground document sets can lead to somewhat di?erent sets of keywords to be 
identi?ed as speci?c to a gene. Two di?erent background sets are discussed that 
are useful for two somewhat di?erent purposes, namely, characterizing the func- 
tion of a gene, and clustering a set of genes based on their shared functional 
similarities. Experimental results that reveal the signi?cance of the choice of 
background set are presented. 
Keywords: Literature mining · Automatic keyword identi?cation · TF-IDF 
Z-score · Background set · Features · Clustering 
1 Objectives and Goals 
The usefulness of certain text mining approaches for automatic identification of 
keywords associated with documents and using those keywords for additional anal- 
ysis, such as classification and clustering of documents, have been studied previ- 
ously [1, 4, 7]. Keywords are identified by the strength of their association with 
documents or document classes, such as tweets [4] or research abstracts associated 
with specific genes [1, 7]. Keywords thus identified are used as features for addi- 
tional purposes, such as classification of tweets based on sentiment [4] or organ- 
izing genes into groups or clusters based on functional similarity [1, 7]. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 505–516, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_38
The strength of association a keyword has to a document or a collection is generally 
not determined in isolation or absolute terms, but within the context of its contrast to its 
strength in a reference or “background” set of documents. In this work, the focus is on 
the signi?cance of the context provided by the background set. 
The objective in this work is to understand the impact of context, provided by such 
a background collection of documents, in text mining to describe the function of a set 
of genes, and in explicating possible similarities in their function by grouping them into 
clusters. The task of clustering genes is carried out as a two-step process: First, keywords 
speci?c to each gene of interest are algorithmically extracted from a subset of 
MEDLINE® documents, based on two metrics: Z-score [1], a well-known statistical 
concept, and TF-IDF [8], a classic term weight metric from information retrieval. The 
formulation of these metrics helps identify how important and distinguishing a keyword 
is for a particular gene. In the second step of clustering, the classic K-means algorithm 
[5] is used to group related genes based on the keyword features into clusters. Each of 
these clusters is interpreted as comprised of functionally related genes, as indicated by 
the keywords the genes in each share among themselves. 
To achieve these stated goals, the extracted keywords should represent two aspects 
of the genes: they should be su?ciently speci?c as to characterize the gene and at the 
same time, some of them should be shared among multiple genes so the genes may be 
organized into functionally related clusters. To capture these two aspects for keywords, 
two di?erent background sets of documents are needed to provide a reference context. 
Others have evaluated strengths of di?erent features relative to the same background 
set, as Ikeda and Suzuki did in identifying peculiar composite strings as in DNA 
sequences [6]. However, few others have attempted to understand the impact of di?erent 
background sets in identifying keywords that are used for di?erent purposes. 
As pointed out above, keywords for a concept, such as a gene in this work, are 
identi?ed based on the strength of their association with the concept. Two alternative 
metrics, namely, Z-scores and a less explored variant of TF-IDF (de?ned below), are 
considered in this work to capture the strength associated with keywords. The quality 
of keywords extracted for some genes from each metric is evaluated by an expert. The 
quality of clusters resulting from K-means is evaluated by calculating the purity of clus- 
ters, which measures the overall similarity of the computed clusters of genes against 
expert-de?ned clusters [2]. 
2 Methods 
Keywords capture and represent the content of documents, such as biomedical abstracts. 
Keywords that appear more often in a document are considered more likely to be repre- 
sentative of the content of the document. This ability of a keyword to represent the 
content of a document is called the representation aspect. Useful keywords also need to 
be able to distinguish between documents. A word that occurs in most documents obvi- 
ously cannot distinguish among those documents. This ability of a keyword to discrim- 
inate between documents will be called the discrimination aspect. Thus, a word that 
occurs in not many documents, i.e., one with a low document frequency, can set the 
506 V. G. Dasigi et al.
small number of documents that it does occur in apart from the many that it doesn’t 
occur in. A word that rates well in both the representation aspect and the discrimination 
aspect would thus be a good keyword. When a concept, such as a gene, is captured by 
a set of documents, it is useful to extend these notions from a single document to a group 
of documents [3]. Thus, a keyword may be thought of as characterizing a group of 
documents (related to a speci?c concept, such as a gene) and as distinguishing the group 
from other groups (also related to other concepts, such as genes). In this extended view, 
the keyword may also be viewed as characterizing a concept, such as a gene (which 
underlies the group of documents), itself, and as distinguishing it from other concepts 
or genes (which underlie the other groups of documents). In order to capture the repre- 
sentation and discrimination aspects of keywords relative to various concepts, the distri- 
bution of the keyword in various (possibly overlapping) groups of documents, which 
correspond to the concepts in question, would be of interest. 
TF-IDF has traditionally focused on the representation and discrimination aspects 
of keywords relative to individual documents in information retrieval [8]. Andrade and 
Valencia, and others following them, have used Z-score more naturally to capture the 
distribution of a keyword within groups of documents [1]. The Z-score is thus directly 
suitable for capturing the representation and discrimination aspects of keywords relative 
to groups of documents, and the concepts underlying them, as Andrade and Valencia 
did with protein families. In order to take advantage of the powerful notion of TF-IDF, 
while adopting it to the context of concepts represented by groups of documents, the 
original de?nition is improvised here. A brief de?nition of Z-score is presented ?rst, 
followed by a discussion of an improvised variant of TF-IDF that extends to groups of 
documents. 
2.1 The Z-Score 
Well-known in statistics, the Z-score of a word a relative to a gene (or other concept) g 
is de?ned as follows, where F stands for a frequency that simply counts the number of 
documents containing a word. 
Za 
g 
= 
Fa 
g 
-a F 
¯a 
??a 
, 
where, 
Fa 
g 
, 
F 
¯a , and ??a all relate to the word a, and are respectively the frequency (number 
of documents that contain the word a, as mentioned above) in the group corresponding 
to the gene g, the average frequency across groups corresponding to all genes of interest, 
and the standard deviation of the frequency across the groups of documents corre- 
sponding to all genes of interest. While the standard deviation plays a useful role in 
de?ning Z-score, it is not a focus of this paper. 
Thus, the Z-score is a measure of how many standard deviations away the frequency 
of the word in a group of documents corresponding to a given gene is from the average 
frequency of the word across the set of various groups of documents corresponding to 
all the genes that are of interest; for instance, a Z-score of 3 means that the frequency 
in question is 3 standard deviations above average. The set of the various groups of 
Impact of Context on Keyword Identi?cation and Use 507
documents corresponding to all the di?erent genes that are of interest mentioned above, 
used as a reference against each individual group of documents that corresponds to a 
speci?c gene, is referred to as a background set. The need for selecting appropriate 
background sets for di?erent purposes is discussed in Sect. 2.3. 
The Z-score is a measure of how far (in terms of the standard deviation) the frequency 
of the word in the group of documents corresponding to a given gene (or a concept) is 
from the average frequency of the word across the various groups of documents corre- 
sponding to all the genes (or concepts) in consideration; for instance, a Z-score of 3 
means that the frequency in question is 3 standard deviations above average. This refer- 
ence set of the various groups of documents corresponding to all the di?erent genes (or 
concepts) that are of interest is the “background set”. The need for selecting appropriate 
background sets is discussed in Sect. 2.3. 
2.2 TF-IDF and Its Variant TF-IGF 
The TF-IDF score is classic and well-known in the information retrieval literature, and 
has been used to capture the strength of individual words to characterize documents and 
distinguish them from other documents [8]. In contrast, in the present work, keywords 
are of interest that distinguishes a gene from other genes (or a concept from other similar 
concepts). The authors have previously extended the notion of TF-IDF to this new 
context [3]. Since the extension is not as well-known as the Z-score, it is brie?y explained 
here. The entire document collection may be thought of as comprising (not necessarily 
exhaustively) a number of possibly overlapping groups of documents corresponding to 
di?erent genes that are of interest. There may also be other documents that are unrelated 
to any of these genes; thus the overlapping groups of documents do not necessarily 
exhaust the entire document collection. Here the focus is on characterizing the repre- 
sentational and discriminating aspect of words relative to each gene (which corresponds 
to a group of documents), and not relative to each document (as was the focus of TF-
IDF). The extension involves de?ning the term frequency 
TFa 
g 
of a term a relative to a 
gene (represented by a group of documents) g, the group frequency of a term a (similar 
to the document frequency of a term), denoted 
GFa , the inverse group frequency for a 
term a, denoted 
IGFa , and ?nally the combined notion TF-IGFg 
a 
, the group variant of 
TF-IDF, that brings all the pieces together. 
TFa 
g 
is de?ned as the sum of the number of times the word a appears in the documents 
corresponding to the gene1 g, that is, 
TFa 
g 
= 
?F 
d?g 
tfa 
d 
, 
1 
This is also sometimes called the collection frequency of the term in the set of documents, and 
counts the total number of occurrences of the term in all the documents of the collection. It 
di?ers from the document frequency of a term in a collection of documents in that the document 
frequency just counts how many documents contain the term (with no distinction on the number 
of occurrences). 
508 V. G. Dasigi et al.
where g is used to refer to a gene, as well as to the group of documents associated with 
it. The summation is over all documents d associated with the gene, or group of docu- 
ments, g, and 
tfa 
d 
is the frequency of the term a in d. 
GFa 
is de?ned simply as the number of genes or groups of documents that include 
(at least a document that contains) the word a. Here G denotes the entire set of genes or 
groups of documents. 
GFa = 
?F 
g ?F G 
{ 
1 if ?d ?d 
g|a is in d 
0 otherwise 
IGFa 
is de?ned much as the classic IDF. 
IGFa = 
log 
|G| 
GFa 
, 
where |G| is the cardinality of the set of gene groups (44 in the present work with yeast 
genes). 
Finally, 
TFa 
g 
and 
IGFa 
are multiplied to form 
TF-IGFa 
g 
= TFa 
g 
·F 
IGFa 
Above, G has denoted the entire set of genes or groups of documents used in 
computing the inverse group frequency for IGFa of a word a. This component is intended 
to capture the aspect of keywords that can distinguish a gene associated with a particular 
group of documents from all genes and the document groups associated with them. As 
in the case of the Z-score, this entire set of groups G is used as the reference against 
which individual groups are contrasted is called the “background set” here. The signif- 
icance of the background set is discussed in the next subsection below in more detail. 
2.3 The Background Set 
In the de?nitions of both Z-score and TF-IGF, the reference set of the various groups 
of documents corresponding to all the di?erent genes that are of interest has been called 
the “background set”. The background set is roughly the universe of interest. The focus 
is on how a word can distinguish a “foreground” set of documents, which corresponds 
to a speci?c gene, from a background set of documents, which corresponds to all the 
genes, and possibly all other concepts at large. Since each gene corresponds to a group 
of documents, the term “gene” and the phrase “group of documents” are sometimes used 
interchangeably, if it suits the context (often with the symbol g left ambiguous between 
a gene and a group of documents). 
In the case of the Z-score, it tries to capture how the frequency of a word (in a speci?c 
group of documents corresponding to a “foreground” gene) deviates (in terms of 
standard deviations) from the average frequency of the word (in the groups of documents 
corresponding to the background set of genes). Thus, the average frequency 
F 
¯a 
and the 
standard deviation 
??a 
are both computed from the background set. 
Impact of Context on Keyword Identi?cation and Use 509
If a word is contained in only one group of documents corresponding to a speci?c 
gene, then the average frequency of the word in the background set would be very small, 
so the word would have a high value of Z-score for that gene, and potentially negative 
Z-scores for all other genes. This in turn captures the notion that the word is very 
signi?cant for that particular gene. Thus it helps us in distinguishing the gene from 
others, and possibly capturing part of its functional description. 
TF-IGF attempts to capture how high the frequency of a word (in a speci?c group 
of documents) corresponding to a “foreground” gene is, while the word occurs relatively 
infrequently in (the groups of documents corresponding to) the background set of genes. 
For any given word a, the ?rst aspect is captured by a high 
TFa 
g 
for a speci?c group of 
documents corresponding to a gene g, and the second aspect is captured by a high 
IGFa 
in the background set. As with Z-score, if a word a has a high TF-IGFg 
a 
for a gene g, the 
word helps distinguish the gene from others, possibly capturing part of its functional 
description. 
Keywords identi?ed for a gene using the Z-score or TF-IGF could conceivably serve 
at least two distinct purposes. They could be used to characterize or describe the function 
of the gene as uniquely or distinctly as possible. Here, the focus would be on distin- 
guishing each gene from the others. Alternatively, the keywords might be used to iden- 
tify possible functional similarities and overlaps between the di?erent genes (indicated 
by possibly shared functional keywords). In this case, it would be desirable to see the 
keywords capture as much of the functionality of each gene as possible, rather than 
emphasize their distinction from other genes. 
It appears that the speci?c choice of background set can impact the appropriateness 
of the keywords selected for the gene for the two distinct purposes discussed above. In 
order to obtain keywords that uniquely characterize a gene, the keywords should be 
associated with the gene in question, but not with any or most of the other genes. A 
natural background set for this purpose would be one that includes groups of documents 
that correspond to each of the genes that are of interest, and no others. Every document 
in the background set would be associated with one or more genes being studied that 
we seek to distinguish from one another. There would be no documents in the back- 
ground set that are unrelated to one gene or the other from the set of genes being studied. 
In the rest of the paper, this background set of documents is referred to as the restricted 
background set. 
On the other hand, suppose the focus were instead on grouping the various genes 
from the set being studied into clusters based on similarities of function, indicated by 
any keywords associated with each gene that are shared with at least another gene. In 
this scenario, what would be very useful is to allow keywords identi?ed for di?erent 
genes to overlap somewhat, indicating potential similarities in function between pairs 
of genes, based on any keywords the pair shares. For this purpose, a background set such 
as the one described in the preceding paragraph would be inappropriate, because it tends 
to focus on distinguishing the various genes, rather than on whether they could be 
similar. A di?erent background set that includes many general documents (including 
other biomedical documents, possibly not necessarily about any of the genes being 
studied) might provide a broader and a more neutral reference. For instance, the entire 
MEDLINE® document collection, which includes many documents that are not 
510 V. G. Dasigi et al.
necessarily about any of the genes in question, could be such a background set. This 
kind of background set is naturally called unrestricted. 
In this work, a restricted background set and an unrestricted background set are 
created for use in identifying slightly di?erent keyword sets for each gene. The hypoth- 
esis, to be veri?ed, is that the former background set is more suitable for selecting 
keywords that are better for characterizing gene function uniquely, while the latter is 
more appropriate for selecting keywords used as gene features for functional clustering 
of genes. The restricted background set is simply formed from the 44 groups of docu- 
ments that correspond to the 44 yeast genes. It is simply the union of all these documents, 
2,233 in total. The unrestricted background set is the entire collection of 6,791, 729 
MEDLINE® abstracts (which is a superset of the restricted set, since they were all 
downloaded at the same time). This entire set is divided randomly into 44 groups, so as 
to keep the methodology consistent and comparable. 
3 Results and Analysis 
As indicated before, a set of 44 genes that are involved in the cell cycle of budding yeast 
have been chosen for this study, since others have studied them, as well. For example, 
Cherepinsky et al. includes a study speci?cally for gene clustering, where they also 
include an expert-de?ned clustering based on functions and transcriptional activators 
[2]. In this work, that same expert-de?ned clustering (not shown here) is used as the 
basis for comparison of the quality of clustering. 
Using both TF-IGF and Z-score with context provided by the restricted and unre- 
stricted background sets, the N top-ranking keywords were generated by varying N from 
10 to 100 for each gene. Thus, four combinations of experiments in all were performed 
for generating keywords and for clustering genes. The top 30 keywords generated by 
both TF-IGF and Z-scores for three di?erent genes were evaluated by an expert. Using 
the top N keywords as features, the K-means algorithm was used to compute gene clus- 
ters [5]. The ?ow of data for computations of Z-score, TF-IGF, and K-means is illus- 
trated in Fig. 1. 
For the 44 yeast genes in consideration, the purity of the computed clustering was 
evaluated relative to the expert-generated clustering from in Cherepinsky et al. [2], as 
mentioned previously at the beginning of this section. A clustering is a set of sets of 
genes. Each inner set of genes is sometimes called a cluster; thus a clustering is a set of 
clusters. Purity is calculated by ?rst computing the best degree of match against any 
inner set of genes in the expert clustering for each inner set of genes in the computed 
clustering, and then averaging this measure over all inner sets in the clustering. 
For clustering purposes, once all the keywords for each of the 44 genes were iden- 
ti?ed, any keywords that are unique to each gene, that is, those not shared by at least 
two genes, were eliminated. Note that these eliminated words are very important for an 
entirely di?erent purpose, namely, to describe the potentially unique functional aspects 
of the respective genes, although not that useful for clustering by K-means. A little fewer 
than half of the total keywords were unique and eliminated for clustering purposes, 
leaving a little over half of the total keywords shared by at least two genes. 
Impact of Context on Keyword Identi?cation and Use 511
3.1 Impact of Di?erent Background Sets – Keyword Quality 
An expert was asked to evaluate the top 30 ranking keywords for three genes namely, 
ace2, cdc21, and mnn1, from all four combinations of experiments. Not surprisingly, 
the name of the gene itself is ranked at the top in most cases. According to the expert, 
keywords obtained using TF-IGF were better than those based on Z-scores. Contrary to 
initial expectation, in the ?rst cut, the quality of the keywords did not appear to depend 
signi?cantly on the background set, although there were di?erences. However, an inter- 
esting observation was made for ace2, which is the name of both a yeast gene and also 
a human gene. When Z-scores were computed by using the restricted background set, 
more keywords related to the cell cycle function of the human gene (renal activity) were 
selected than with the unrestricted background set. This surprising result has an inter- 
esting explanation: the restricted background set results in keywords that are less likely 
to be shared between the di?erent genes, and keywords related to human functions of 
ace2 are less likely to be shared by other yeast genes. This expectation was at the heart 
of the rationale behind the original hypothesis that keywords selected with the smaller, 
restricted background set are better for de?ning the functions of the genes with a partic- 
ular focus on their distinctness relative to the all other genes represented in the back- 
ground set, while those selected with the larger, unrestricted background set are better 
for clustering! Space considerations prohibit listing the keywords identi?ed for the genes 
under the four combinations. 
Fig. 1. The entire MEDLINE corpus constitutes the unrestricted background set; the restricted 
background set is the subset of documents that corresponds to the 44 yeast genes. Four sets of 
keywords are computed, based on Z-scores using the restricted background set (K-Z-R) and the 
unrestricted one (K-Z-U), as well as based on TF-IGF using the restricted background set (K-T-
R) and the unrestricted one (K-T-U). Eventually four clusterings (C-Z-R, C-Z-U, C-T-R, and C-
T-U) are computed by K-means using these respective sets of keywords. 
512 V. G. Dasigi et al.
3.2 Impact of Di?erent Background Sets – Functional Clustering of Genes 
After identifying sets of keywords associated with the genes of interest, the 
keywords that are shared by more than one gene were used as features that form the 
basis of K-means clustering. An initial set of clustering experiments was conducted 
separately with the values of TF-IGF and Z-score that are associated with each 
keyword as feature weights for the clustering algorithm. Those initial experiments 
produced clustering results that were not particularly meaningful or interesting, 
prompting the authors to switch to using simple binary weights for the features 
(keywords) instead. The binary weights were defined as 1 if the word appears in at 
least one document associated with the gene set, and 0 otherwise. Simplistic as these 
weights are, intuitively binary weights on keywords in the context of the clustering 
algorithm capture the notion of shared keywords. Experiments were repeated based 
on 10, 20, 30, 50, 70, and 100 top-ranking keywords for each gene from each of the 
lists generated by TF-IGF and Z-scores. Tables 1 and 2 show the purity results for 
the clusters computed by the K-means algorithm based on keywords generated using 
TF-IGF and Z-scores, respectively, both within the context of the restricted back- 
ground set. In the bottom row, the tables also show the total number of distinct 
keywords used by the clustering algorithm across all 44 genes. 
Table 1. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs, 
with keywords based on TF-IGF computed in the context of the restricted background set. 
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100 
Micro purity 0.636 0.659 0.682 0.562 0.500 0.546 
Macro purity 0.707 0.723 0.742 0.643 0.559 0.567 
Keywords 315 600 830 1383 1833 2530 
Table 2. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs, 
with keywords based on Z-scores computed in the context of the restricted background set. 
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100 
Micro purity 0.409 0.477 0.477 0.432 0.432 0.409 
Macro purity 0.455 0.523 0.511 0.496 0.489 0.443 
Keywords 475 1010 1524 2280 2888 3623 
Purity values are averaged across all clusters in two possible ways. As mentioned 
before, each clustering (which is a set of sets of genes) may be viewed as a set of clusters, 
where each cluster is a set of genes. Macro-averaging involves simply averaging purities 
of individual clusters across all clusters of a clustering. Since the purity of each cluster 
is a ratio, the alternative technique of micro-averaging (which is not really a kind of 
averaging in the mathematical sense) involves taking the ratio of the sum of numerators 
and the sum of denominators, without reducing any of the individual ratios. The micro 
purity and macro purity rows in the tables refer to the micro-averaged purity and macro-averaged 
purity across the clusters of the computed clustering. 
Impact of Context on Keyword Identi?cation and Use 513
From Tables 1 and 2, it is interesting to notice that, when the restricted back- 
ground set is used in computing the metrics, purity of the clusters based on keywords 
identified using TF-IGF is substantially better than that relating to Z-scores. The 
results with TF-IGF are better in terms of both higher purity and fewer keywords 
than with Z-scores! Fewer features allow for faster clustering. 
The experiments were continued with the unrestricted background set to compute 
the TF-IGF and Z-scores, and select keywords based on those computations. The unre- 
stricted background set has a much larger number of documents. The documents were 
divided randomly into 44 groups for calculating the IGF and Z-scores. The 10, 20, 30, 
50, 70, and 100 top-ranking keywords were once again obtained for each gene from the 
lists generated based on the TF-IGF and Z-score metrics. Only the keywords shared by 
at least two genes were considered, and the binary feature weight was used once again. 
The K-means algorithm was repeated 1000 times again with 9 clusters to get the optimal 
solution many times. Tables 3 and 4 show the purity results of clusters computed by the 
K-means algorithm based on keywords generated using TF-IGF and Z-scores, respec- 
tively, this time both within the context of the unrestricted background set. As in 
Tables 1 and 2, in the bottom row, Tables 3 and 4 also show the total number of distinct 
keywords used by the clustering algorithm across all 44 genes. 
Table 3. Clustering results for 9 clusters using binary keyword weight, based on 1000 runs, with 
keywords based on TF-IGF computed in the context of the unrestricted background set. 
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100 
Micro purity 0.682 0.614 0.682 0.659 0.636 0.636 
Macro purity 0.749 0.674 0.640 0.696 0.716 0.674 
Keywords 247 417 590 885 1168 1563 
Table 4. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs, 
with keywords based on Z-scores computed in the context of the unrestricted background set. 
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100 
Micro purity 0.682 0.659 0.614 0.636 0.636 0.568 
Macro purity 0.708 0.699 0.708 0.728 0.630 0.663 
Keywords 309 547 747 1139 1526 2067 
Tables 3 and 4 indicate that in the context of an unrestricted background set, the 
purity of clustering with either metric is perhaps comparable to that with the other. The 
total number of keywords extracted with TF-IGF is always fewer than that with Z-scores, 
though, indicating faster clustering with TF-IGF. 
The purities of the clustering based on keywords with the N top-ranking Z-scores 
computed relative to both the restricted and unrestricted background sets are compared 
next, from Tables 2 and 4. It can be readily seen that the purity is much better across the 
board for the clusterings computed in the context of the unrestricted background set than 
for the clusterings computed in the context of the restricted background set. The positive 
impact of the unrestricted background set is also evident from a comparison of the 
514 V. G. Dasigi et al.
numbers of keywords used in computing the clusterings for each threshold of top-ranking 
keywords. Fewer keywords in the context of the unrestricted background set 
obviously means more keywords are shared between di?erent genes. These results 
substantiate the original hypothesis for Z-score that an unrestricted background set 
allows for identi?cation of more shared keywords for genes, and consequently, better 
clustering by gene function. 
Now, consider the purity of the clusterings generated by keywords with the top 
N TF-IGF scores, computed in the context of the restricted and unrestricted back- 
ground sets, respectively, from Tables 1 and 3. These results with the TF-IGF 
metric are less categorical based on the cluster purity than with Z-scores, although 
the purity results with the unrestricted background set are somewhat better or the 
same in most (actually 75%) of the cases presented, and in all the cases when 50 or 
more top-ranked keywords are considered. Another interesting point to note is that, 
as with Z-score, there is always a smaller number of distinct words in the Top N 
ranking words when the unrestricted background set is used, indicating that more 
keywords are shared within the context of an unrestricted background set. The 
chances of more keywords being shared is higher when more keywords are consid- 
ered, in general, which is borne out by the previous observation that purity is clearly 
improved with the unrestricted background set when 50 or more keywords are 
considered. These results once again lend substantial support to the original hypoth- 
esis that, even for TF-IGF, use of a broader or unrestricted background set is better 
for functional clustering of genes than a narrower or more restricted one. 
4 Summary and Conclusion 
In this paper, two metrics have been reviewed for identifying keywords that have a strong 
association with a particular concept of interest, such as a gene, based on the prevalence 
of the keyword in documents that are about the concept, contrasted to the keyword’s distri- 
bution in a general “background” set of documents. The two metrics used in working with 
a set of 44 yeast genes are the standard statistical metric of Z-score and an extension of the 
classic TF-IDF weight metric from information retrieval, which has been named TF-IGF. 
The initial hypothesis is that different choices of background sets of documents lead to 
keywords with somewhat different properties suitable for different purposes. 
In relation to the ability of keywords to uniquely characterize the genes, especially 
as distinguished from other genes, TF-IGF seemed to yield somewhat better keywords, 
as judged by an expert. Some weak evidence was also found for the hypothesis that a 
restricted background set might be more suitable for identifying keywords that are likely 
to uniquely characterize the genes in the context of Z-score. 
As for clustering of genes, TF-IGF produced keywords that led to clustering with better 
purity than the Z-score, with either background set. The results were also achieved with 
fewer keywords with TF-IGF than with the Z-score, which is an additional bonus that leads 
to faster clustering. In addition, strong evidence was found for the hypothesis that an unre- 
stricted background set is more suitable with either Z-score or TF-IGF for identifying 
keywords that could be potentially shared between different genes and thus more suitable 
Impact of Context on Keyword Identi?cation and Use 515
for use in the K-means clustering algorithm. The evidence was supportive of the original 
hypothesis in two aspects: a higher averaged cluster purity was obtained with fewer 
keywords with the unrestricted background set, irrespective of whether Z-score or TF-IDF 
was used to identify the keywords. 
A final observation about our hypothesis about the impact of the choice of back- 
ground set on keyword quality for characterizing each gene and on clustering of 
genes needs to be made in relation to the Z-score metric. For characterizing each 
gene distinctly, it is important to identify as many unique keywords for each gene 
(preferably not shared with many other genes) need to be identified. For clustering 
of genes based on shared function, it is important to allow for more keywords with 
strong association to the genes to be shared between multiple genes. The results 
presented in Sects. 3.1 and 3.2 support this hypothesis much more decisively for the 
Z-score metric than for the TF-IGF. Indeed, this is not so surprising because the 
hypothesis has a more intuitive basis in the definition of the Z-score metric! 
Acknowledgments. The authors acknowledge that the MEDLINE® data used in this research are 
covered by a license agreement supported by the U.S. National Library of Medicine. Thanks are 
also due to Professor Rajnish Singh (Kennesaw State University) for her assistance in relation to 
evaluating the keywords for the various genes, and for her help in other ways related to this work. 
References 
1. Andrade, M., Valencia, A.: Automatic extraction of keywords from scienti?c text: application 
to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998). https:// 
doi.org/10.1093/bioinformatics/14.7.600 
2. Cherepinsky, V., Feng, J., Rejali, M., Mishra, B.: Shrinkage based similarity metric for cluster 
analysis of microarray data. Proc. Natl. Acad. Sci. USA 100(17), 418–427 (2003). https:// 
doi.org/10.1073/pnas.1633770100 
3. Dasigi, V., Karam, O., Pydimarri, S.: An evaluation of keyword selection on gene clustering 
in biomedical literature mining. In: Proceedings of Fifth IASTED International Conference on 
Computational Intelligence, pp. 119–124 (2010). URL: http://www.actapress.com/ 
Abstract.aspx?paperId=43008 
4. Hamdan, H., Bellot, P., Béchet, F.: The impact of Z-score on Twitter sentiment analysis. In: 
Proceedings of 8th International Workshop on Semantic Evaluation, pp. 596–600 (2014). 
https://doi.org/10.3115/v1/s14-2113 
5. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a K-means clustering algorithm. J. R. Stat. 
Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830 
6. Ikeda, D., Suzuki, E.: Mining peculiar compositions of frequent substrings from sparse text 
data using background texts. In: Proceedings of European Conference on Machine Learning 
and Knowledge Discovery in Databases, Springer Lecture Notes in Arti?cial Intelligence, vol. 
5781, pp. 596–611 (2009). https://doi.org/10.1007/978-3-642-04180-8_56 
7. Liu, Y., Navathe, S., Pivoshenko, A., Dasigi, V., Dingledine, R., Ciliax, B.: Text analysis of 
MEDLINE for discovering functional relationships among genes: evaluation of keyword 
extraction weighting schemes. Int. J. Data Min. Bioinform. 1(1), 88–110 (2006). https:// 
doi.org/10.1504/ijdmb.2006.009923 
8. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. 
Manag. 24, 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0 
516 V. G. Dasigi et al.
A Cloud-Based Decision Support System 
Framework for Hydropower Biological Evaluation 
Hongfei Hou1,2(?) , Zhiqun Daniel Deng1,3 , Jayson J. Martinez1 , Tao Fu1 , Jun Lu1 , 
Li Tan2 , John Miller2 , and David Bakken4 
1 
Paci?c Northwest National Laboratory, Energy and Environment Directorate, 
Richland, WA 99352, USA 
hongfei.hou@wsu.edu 
2 
School of Engineering and Applied Sciences, Washington State University Tri-Cities, 
2710 Crimson Way, Richland, WA 99354, USA 
3 
Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA 
4 
School of Electrical Engineering and Computer Science, Washington State University, 
355 NE Spokane St., Pullman, WA 99163, USA 
Abstract. Hydropower is one of the most important energy sources: it accounts 
for more than 80% of the world’s renewable electricity and 16% of the world’s 
electricity. Signi?cantly more hydropower capacity is planned to be developed. 
However, hydro-structures, including hydroelectric dams, may have adverse 
biological e?ects on ?sh, especially on migratory species. For instance, ?sh can 
be injured or even killed when they pass through turbines. This is why biological 
evaluations on hydro-structures are needed to estimate ?sh injury and mortality 
rates. The Hydropower Biological Evaluation Toolset (HBET) is an integrated 
suite of science-based desktop tools designed to evaluate whether the hydraulic 
conditions of hydropower structures are ?sh friendly by analyzing collected data 
and providing estimated injury and mortality rates. The Sensor Fish, a small 
autonomous sensor package, is used by HBET to record data describing the 
conditions that live ?sh passing through a hydropower structure will experience. 
In this paper, we present a plan to incorporate cloud computing into HBET, and 
migrate into a cloud-based decision support system framework for hydropower 
biological evaluation. These enhancements will make the evaluation system more 
scalable and ?exible; however, they will also introduce a signi?cant challenge: 
how to maintain security while retaining scalability and ?exibility. We discuss 
the technical methodologies and algorithms for the proposed framework, and 
analyze the relevant security issues and associated security countermeasures. 
Keywords: Decision support system · Hydropower · Dam · Fish injury 
Fish-friendly turbine 
1 Introduction 
A decision support system (DSS) is a type of interactive knowledge-based software that 
uses prede?ned models to process data inputted from various data sources to help busi- 
nesses and organizations in decision-making activities [1]. A DSS is composed of three 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 517–529, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_39
fundamental components [2]: a data management component imports/stores data and 
provides data access to other components; a decision-making component, containing 
prede?ned decision-making models, compiles useful information from the data provided 
by the data management component to make decisions; and a presentation component 
enables users to interact with the systems (Fig. 1). DSS will be incorporated into the 
Hydropower Biological Evaluation Toolset (HBET), an integrated suite of science-based 
desktop tools designed to evaluate the degree to which the hydraulic conditions 
of hydropower structures (e.g., turbine, spillway, overshot weir, undershot weir, and 
pumped storage) a?ect entrained ?sh by analyzing the collected data and providing 
estimated injury and mortality rates based o? experimentally derived, species-speci?c 
dose-response relationships [3]. 
Fig. 1. Architecture of DSS [4]. 
Hydropower is one of the most important energy sources, accounting for more than 
80% of the world’s renewable electricity and about 16% of the entire world electricity 
supply [5]. Signi?cantly more hydropower capacity is planned to meet demand [5]. 
However, hydro-structures, including hydroelectric dams and hydraulic turbines, may 
have adverse biological e?ects on ?sh, especially on migratory species. For example, 
?sh can be injured or even killed when they pass through turbines [6–10]. This is why 
518 H. Hou et al.
biological evaluations of hydro-structures are needed to estimate ?sh injury and 
mortality rates. 
HBET uses the Sensor Fish (SF), a small autonomous sensor package instrument 
[11], to collect data describing the conditions that would be experienced by live ?sh 
passing through a hydro-structure. SF and HBET can support evaluations of turbines 
and sites including physical components (barriers, trash racks, spillways, etc.) ?sh 
interact with during downstream passage to identify the most ?sh-friendly alternatives. 
Currently HBET is platform dependent, and is available only to users who have access 
to computers where HBET has been installed. To increase its availability and usage, we 
will incorporate cloud computing into HBET, and migrate it into a cloud-based DSS 
framework for hydropower biological evaluation. This will make HBET available to 
users no matter where they are as long as they have an internet connection. Users would 
always use the latest version without installation and upgrading. This will also make 
HBET more scalable and ?exible in incorporating new dose-relationship, new ?sh 
species, and study types. However, at the same time, it introduces a signi?cant concern: 
how to maintain system security without adversely impacting the scalability and ?exi- 
bility so no proprietary information is compromised. In this paper, we discuss the tech- 
nical methodologies and algorithms for the proposed framework, and analyze the rele- 
vant security issues and associated security countermeasures. 
2 Overview of the Framework 
The framework contains three major components, which reside in the cloud (Fig. 2). 
The ?rst component is data acquisition and integration (DAI), which contains modules 
that receive data. There are two types of data sources for hydropower evaluation: the 
internal database and external SF ?les. The second component is decision-making (DM), 
which contains modules to estimate injury and mortality rates for di?erent ?sh species. 
For example, one module could estimate the barotrauma mortal injury rate, and another, 
the major injury rate due to shear. Currently HBET can assess strike, shear and baro- 
trauma stressors. Fish species it supports include Chinook salmon, Australian bass, 
Gudgeon, Murray cod, and Silver perch. Any incorporation of new stressors or new ?sh 
species can be added through this component. The third component is data validation 
and self-monitoring (DVSM), which contains modules which validate input data for 
DAI modules and monitor outputs and behaviors of DM modules. For each module in 
the ?rst two components, there will be a corresponding data validation and self-moni- 
toring module. Each component can adopt as many modules as needed, and these 
modules can be used for di?erent purposes (i.e., di?erent ?sh species, di?erent study 
type, and so on). For example, the DM component can contain DM modules for Chinook 
salmon, and for Australian bass. Similar to a typical DSS, the proposed framework also 
includes a knowledge base which includes information such as rules, logic, and corre- 
sponding conditions. For example, 
A Cloud-Based Decision Support System Framework 519
Fig. 2. Architecture of the proposed framework. 
In the proposed framework, we will implement four countermeasures to address 
cloud-related security concerns: 
1. Introduce DVSM modules into the framework. Data validation is used to ?lter out 
invalid input data. Self-monitoring will use data mining to predict each DM module’s 
output, and compare the predicted result with the actual output from the DM module 
to determine if the output is expected. Self-monitoring will also monitor modules’ 
behavior such as resource usage and execution time. (i.e., the proposed framework 
would monitor its own behavior during runtime). 
2. Use data encryption so that data cannot be interpreted even if it is exposed to unau- 
thorized users. 
3. Use a login token and temporary password so that the commitment of any cloud 
interfaces and API requests needs a valid login. 
4. Create a module set for each study type so that module set’s failure in one study type 
does not a?ect other study types. 
520 H. Hou et al.
3 System Security and Countermeasures 
In order to analyze security levels of the proposed framework, we ?rst need to identify 
its vulnerabilities. There are common vulnerabilities that exist in all types of DSSs, such 
as security issues in account authentication and lack of security education [12]. In this 
research, we focus on the vulnerabilities that exist only in cloud-based DSSs, but not in 
desktop ones: 
1. Insecure cloud interfaces and APIs. Cloud-based systems provide cloud interfaces 
and APIs [13] through which to communicate with other systems and/or devices, 
and thus their security will depend on the security of the cloud interfaces and APIs. 
These issues include insecure cloud interfaces, immature cloud APIs, insu?cient 
inputted data validation, and insu?cient self-monitoring [14]. 
2. Resource overbooking. Resources can be overused if the modules in the cloud-based 
DSS are modeled inaccurately [15]. This can also happen if attackers intentionally 
design a module to allocate or occupy resources without limits. If resource over- 
booking occurs, services of a cloud-based DSS will become unavailable (i.e., the 
DSS will be inaccessible). Typical methods employed by attackers to overbook 
resources include unlimited memory allocation, unlimited occupation of storage, 
and unlimited occupation of bandwidth. 
3. Data exposure. Input data should only be accessible and exposed to the desired DAI 
modules, and output from DM modules should only be exposed to the desired devices 
or DVSM modules. However, since the data or training set data are saved in the 
cloud-based database, they can be co-located with the data owned by competitors or 
intruders because of weak separation [16]. 
4. Vulnerabilities in virtual machines and hypervisors. Cloud-based systems will run 
in virtual machines or hypervisors. Compromises occurring in virtual machines and 
hypervisors may introduce data leakage [17], and resource overbooking. 
Threats in cloud computing will also exist in cloud-based DSS. There are 12 top 
security threats faced by cloud-based service, called the “Treacherous 12” [18] as shown 
in the ?rst 12 rows of Table 1. 
“Data breach” is always the major concern for all systems, including both desktop-based 
systems and cloud-based ones. “Data encryption” can be applied to saved data so 
that data cannot be interpreted even if it is breached. “Insu?cient identity, credential, 
and access management,” “system vulnerabilities”, “account or service hijacking”, 
“malicious insiders”, “advanced persistent threats”, “data loss”, and “insu?cient due 
diligence” can bring security risks to input data, stored data, and systems themselves. 
In this research, we aim to maintain ?exibility and scalability but retain security when 
migrating a desktop DSS into a cloud-based one. Thus, we will only focus on the threats 
that are unique to cloud-based systems and some threats that are major concerns: “data 
breaches”, “insecure cloud interfaces and cloud APIs”, “abuse and nefarious use of cloud 
services”, “denial of services”, and “shared-technology vulnerabilities”. The counter- 
measures we will implement in the proposed framework are to address the focused 
threats. 
A Cloud-Based Decision Support System Framework 521
Table 1. Threats in cloud-based DSS 
Threats Explanations 
Data breaches Data will be breached if it is accessed by 
unauthorized services or function calls, or when 
authorized services or function calls use the data in 
an improper way. Data breaches are not unique to 
cloud-based DSS, but it is the top concern for cloud-based 
DSS users [15, 19] 
Insu?cient identity, credential, and access 
management 
If identity, credential, and access management is not 
su?cient, sensitive data can be exposed to 
unauthorized entities, and data and applications can 
be manipulated unexpectedly [19] 
Insecure cloud interfaces and cloud APIs Cloud interfaces and cloud APIs are the fundamental 
parts of cloud-based DSSs. They are the bridges 
between system components and databases. If the 
cloud interfaces and APIs are not secure, attackers 
can use them to access data and perform commitment 
as often as they wish 
System vulnerabilities System vulnerabilities include bugs or issues in 
operating systems or software. Exploiting system 
vulnerabilities is a common way for attackers to 
commit their actions 
Account or service hijacking After hijacking an account or service, attackers can 
bypass the authentication process and then pretend to 
be legitimate users, operators, or software 
developers, in order to achieve their goals [19] 
Malicious insiders Malicious insiders can cause much more damage 
than other threats. For example, a system 
administrator can access any data and any 
application, and thus can in?ict any kind of damage 
Advanced persistent threats Advanced persistent threats (APTs) are cyberattacks 
used to gain control over the systems to steal data 
Data loss Input data or training set data stored can be deleted 
or erased once attackers take control of a system. This 
can also occur because of human error, but this is not 
the focus of this research 
Insu?cient due diligence Without due diligence, wrong technologies or wrong 
system con?gurations can be applied. This will 
introduce a potentially large risk 
Abuse and nefarious use of cloud services If cloud services are not secured, they can be abused 
to achieve certain speci?c goals; for example, email 
spam 
Denial of services If a resource is overused, the system may have no 
resources left to process any incoming legitimate 
requests 
Shared-technology vulnerabilities Sharing technology make cloud service more 
scalable. However, it brings vulnerabilities at the 
same time 
Insecure virtual machines and hypervisors If the virtual machines or hypervisors of the cloud-based 
system are not secure, the cloud-based system 
will be at risk 
522 H. Hou et al.
The First Counter Measure is to Introduce DVSM Modules Into the Framework. 
Each time a new DM module is added to the system, the system will use the inputted 
domain (e.g., Turbine) and subdomain information (e.g., Francis) to ?nd a matching 
DVSM module, and then create a new instance of the module found for the newly added 
module. If there is no existing module, the system will display user interfaces to request 
information to generate the corresponding DVSM module. There are three steps to 
collect information. The ?rst step is to collect information about data validation, such 
as data type or data sequence format. The second step is to collect information about 
self-monitoring, such as conditions and corresponding actions when conditions are met. 
For example, if execution time exceeds 30 s, change the module’s status to “Suspicious”. 
The third step is to input a training data set which will be used for the self-monitoring 
part of the newly generated module. The training data set will be saved into the database 
for further reference. The newly generated DVSM module and training data set will be 
reviewed and veri?ed before being put into use. In the data validation part of each DVSM 
module, we use a structured validation consisting of several operations for all newly 
acquired data [20]. The ?rst operation is to check whether the input datasets are in correct 
format. For example, the dataset collected from Sensor Fish should be “<pres- 
sure><acceleration><acceleration><acceleration_z><temperature><voltage> 
<rotation_x><rotation_y><rotation_z><magnetic_x><magnetic_y><magnetic_z> 
”. The second operation is to check the data type for each ?eld. For example, the data 
type of “pressure” should be “?oat”. The third operation is data range check to make 
sure the acquired data are within reasonable limits. For example, the pressure should be 
greater than 0 psi. The last operation, to check data frequency, is to make sure that data 
are collected at the expected intervals. In the self-monitoring part of each DVSM 
module, we will implement the k-nearest neighbors algorithm (KNN), a non-parametric 
algorithm with lazy learning [21] for data mining on modules’ outputs and behaviors. 
We chose KNN for the following reasons: KNN is e?cient because the lazy learning 
algorithm can use the training data set without any generalization; KNN has been used 
widely and can be applied for data with arbitrary distribution because it is not parametric; 
KNN is ranked among the top 10 data mining algorithms [22]. 
In this research, we use SF as the data source. For each study, we deploy SF at the 
desired study site to get a su?cient sample size for statistical analysis and required 
precision. Each time a SF is released, the corresponding hydro-structure’s environmental 
characteristics are recorded. After all SFs are released and recovered, the data ?les are 
downloaded from the SF. DAI modules will upload these downloaded SF ?les into the 
system, and then pass the interpreted data into the hydropower evaluation DM modules. 
DVSM modules will use the attributes shown in Table 2 to monitor the outputs from the 
DM modules. 
Table 3 is part of the training data set, which contains combinations of attributes’ 
value and expected outputs. Multiple classes describe the modality injury rates associ- 
ated with the corresponding stressors: BMIR refers to barotrauma mortal injury rate, and 
SMIR refers to shear major injury rate. 
A Cloud-Based Decision Support System Framework 523
Table 2. Attribute list to monitor HBET decision-making module 
Name Description 
DN Domain name, such as Hydropower Biological Evaluation (represented as an integer; 
e.g., “1”). Read from the con?guration ?les 
STN Study-type name, such as Turbine (represented as an integer; e.g., “1”). Read from 
the con?guration ?les 
SSTN Sub-Study type name, such as Francis (represented as an integer; e.g., “0”). Read 
from the con?guration ?les 
FS Fish species studied, such as Chinook salmon (represented as an integer; e.g., “11”) 
AFD Actual total ?ow discharge of the study site, in thousands of cubic feet per second 
TFD Target total ?ow discharge of the study site, in thousands of cubic feet per second 
APG Actual power generation of the study site, in megawatts 
TPG Target power generation of the study site, in megawatts 
BP Barometric pressure measured when the SF is released in pounds per square inch 
ERD Estimated release depth when SF is released, in feet 
BA Blade angle of the turbine, in percentage 
WGO Wicket gate open percentage 
TE Tailwater elevation of the study site, in feet 
FB Forebay elevation of the study site, in feet 
HHE Hydraulic head elevation of the study site, in feet 
Table 3. Training data to monitor HBET decision-making module 
DN 1 1 1 1 1 
STN 1 1 1 1 1 
SSTN 0 0 0 0 0 
FS 11 11 11 11 11 
AFD 50.087894 50.017833 50.143431 50.094383 50.043149 
TFD 80 80 80 80 80 
APG 92.438843 91.543232 93.431293 91.738209 91.637234 
TPG 150 150 150 150 150 
BP 14.721021 14.697332 14.719908 14.716734 14.700632 
ERD 127.989454 124.548293 126.431829 125.438219 126.008943 
BA 0.15 0.15 0.15 0.15 0.15 
WGO 0.57 0.57 0.57 0.57 0.57 
TE 17.895445 17.047384 18.089433 17.894343 18.047854 
FB 120.483943 119.483943 120.894320 119.823343 120.439083 
HHE 102.588498 102.436559 102.804887 101.929000 102.391229 
BMIR 0.045684 0.053612 0.047534 0.048893 0.051234 
SMIR 0.021367 0.031267 0.029123 0.035623 0.013434 
524 H. Hou et al.
After processing each SF data ?le, DVSM modules will retrieve the corresponding 
information for each attribute shown in the Table 1 to generate a vector, which is then 
used to calculate the Euclidean distance (ED, the square root of the sum of the square 
of di?erences between the corresponding values of two vectors; Eq. 1) [23] against each 
row of the training set. 
ED(x, y) = 
v) ?n 
i=1 
|xi 
-i 
yi|2 
(1) 
After calculating the EDs for all rows in the training set, the system will add the 
results as a new column into the training set and sorts it by the ED in ascending order. 
The predicted class is the majority of the classes in the top K rows, which is used to 
compare the result made by the corresponding DM module for the given inputted data 
set. Comparison results are accumulated to calculate the error rate. In this research, we 
would choose 11 as the value of K for the accuracy based on the chart (Fig. 3). 
Fig. 3. Relationship between value of K and accuracy of KNN. 
The Second Countermeasure is to Use Data Encryption. When registering to use 
the cloud-based framework, an organization will be provided a public/private key set. 
Input data will be interpreted by the DAI module, and then encrypted using the provided 
public key and saved into the database. The training data set will also be encrypted and 
saved into the database. For the DM module, any time it retrieves data from the database, 
the private encryption key will be used to decode the data for further processing. The 
public key is shared with the public, and the private key ?le is distributed by the organ- 
ization only to authorized users and saved into an encrypted USB drive. When using the 
cloud-based DSS, the USB driver holding the private key ?le should be connected. 
Without the private key, data cannot be interpreted. 
A Cloud-Based Decision Support System Framework 525
The Third Counter Measure is to Use a Login Token and Temporary Password. 
A login token is generated when a user logs into the systems, and it expires any time the 
user logs out or the session times out. When the server side processes the login request, 
it will ?rst validate it by checking the username and password. If the validation passes, 
it then sends a temporary password to the user’s email or cell phone, based on the user’s 
selection. The user must input the correct temporary password for the system to success- 
fully log the user in and generate the login token. The token generated is used for each 
call of the cloud interfaces and application programming interfaces (APIs) as one of the 
properties in the parameter JavaScript Object Notation (JSON) object. When the server 
side of the cloud-based DSS receives the service request with the passed-in JSON object, 
it ?rst retrieves and validates the login token. If the login token is valid, the system 
processes the request and moves forward. Otherwise, the request is discarded. 
The Fourth Countermeasure is to Create a Module Set for Each Study Type. For 
example, for a Turbine study, there will be a module set including a DAI module and a 
corresponding DVSM module, a DM module and a corresponding DVSM module. Thus, 
failure on any module in a Turbine study’s module set will be isolated from other study 
types. 
Besides applying the above mentioned countermeasures, we will also act quickly on 
suggestions by system providers, such as upgrading or installing patches. Table 4 shows 
the speci?c countermeasures proposed for the threats which may be encountered. 
Table 4. Threats and countermeasures 
Threats Security countermeasures 
Data breaches Data encryption, login token and temporary 
password, self-monitoring 
Insu?cient identity, credential, and access 
management 
Data encryption, login token and temporary 
password, self-monitoring 
Account or service hijacking Login token and temporary password, self-monitoring 
Advanced persistent threats Login token and temporary password, self-monitoring 
Insecure cloud interfaces and cloud APIs Login token and temporary password, self-monitoring 
Abuse and nefarious use of cloud services Data validation and scanning, self-monitoring 
Denial of services Data validation and scanning, self-monitoring 
Shared-technology vulnerabilities Independent component set, data encryption, 
login token and temporary password, data 
validation and scanning, self-monitoring 
For threats not listed in Table 4, we will take actions suggested by system providers. 
By applying the latest upgrades and installing the latest patches, we can prevent the 
security risks due to “system vulnerabilities”. By improving employee screening and 
hiring practices, we can reduce the issues that can be caused by “malicious insiders”. 
Providing su?cient security education will signi?cantly improve the security level 
526 H. Hou et al.
against risks caused by “insu?cient due diligence” [24]. “Data loss” is caused mainly 
by human error. By educating employees and incorporating login tokens and temporary 
passwords, organizations can signi?cantly reduce data loss. By con?guring the virtual 
machine as suggested by the vendors, applying all security patches, installing all security 
upgrades, and pursuing regular monitoring, risks introduced by “virtual machine vulner- 
abilities” will be controlled. DVSM is the fundamental component in the proposed 
framework. Compared to these existing methods, this component will not only monitor 
the input data and output from DM modules’, but also monitor the modules’ behaviors. 
Under these conditions, the proposed framework can maintain security when migrating 
from a desktop DSS. 
4 Conclusion 
To increase the availability and usage of HBET, we will incorporate cloud computing 
and migrate it into a cloud-based DSS framework. This will make the systems more 
scalable and ?exible. To maintain security while retaining scalability and ?exibility, we 
will implement several security countermeasures in the proposed framework. By 
applying data encryption, login tokens and temporary passwords, and data validation 
and self-monitoring, the proposed DSS framework can address threats including data 
breaches; insu?cient identity, credential, and access management; account or service 
hijacking; advanced persistent threats; insecure cloud interfaces and cloud APIs; abuse 
and nefarious use of cloud services; denial of services; and shared-technology vulner- 
abilities. For threats not mentioned above, we will take actions suggested by system 
providers. By applying the latest upgrades and installing the latest patches, we can 
prevent security risks due to system vulnerabilities. By improving employee screening 
and hiring practices, we can reduce issues caused by malicious insiders. Providing 
su?cient security education will signi?cantly reduce risks caused by insu?cient due 
diligence. Data loss is caused mainly by human error. By educating the employees and 
incorporating login tokens and temporary passwords, organizations can signi?cantly 
reduce data loss. By con?guring the virtual machine as suggested by the vendors, 
applying all security patches, installing all security upgrades, and pursuing regular 
monitoring, risks introduced by virtual machine vulnerabilities will be controlled. We 
conclude that the proposed framework can maintain security when migrating from a 
desktop DSS. For future work, we will use this paper as the basis to implement the 
proposed cloud-based DSS framework and deploy it into the cloud. 
Acknowledgments. The work described in this article was funded by the U.S. Department of 
Energy Water Power Technologies O?ce. 
A Cloud-Based Decision Support System Framework 527
References 
1. Power, D.J.: Decision Support Systems: Concepts and Resources for Managers. Greenwood 
Publishing Group, Santa Barbara (2002) 
2. Sage, A.P.: Decision Support Systems Engineering, 1st edn. Wiley, Hoboken (1991). 
ISBN-10: 047153000X, ISBN-13: 978-0471530008 
3. Hou, H., Deng, Z.D., Martinez, J., Fu, T., Duncan, J.P., Johnson, G.E., Lu, J., Skalski, J.R., 
Townsend, R.L., Tan, L.: A hydropower biological evaluation toolset (HBET) for 
characterizing hydraulic conditions and impacts of hydro-structures on ?sh. Energies 11(4), 
990 (2018) 
4. Turban, E., Aronson, J.E.: Decision Support Systems and Intelligent Systems, 6th edn. 
Prentice Hall, Upper Saddle River (2001). ISBN:0130894656, 9780130894656 
5. REN21: Renewables 2016 Global Status Report (Paris: REN21 Secretariat) (2016). ISBN: 
978-3-9818107-0-7 
6. Brown, R.S., Colotelo, A.H., P?ugrath, B.D., Boys, C.A., Baumgartner, L.J., Deng, Z.D., 
Silva, L.G.: Understanding barotrauma in ?sh passing hydro structures: a global strategy for 
sustainable development of water resources. Fisheries 39(3), 108–122 (2014) 
7. Cada, G.F.: The development of advanced hydroelectric turbines to improve ?sh passage 
survival. Fisheries 26(9), 14–23 (2001) 
8. Cushman, R.M.: Review of ecological e?ects of rapidly varying ?ows downstream from 
hydroelectric facilities. N. Am. J. Fish. Manag. 5(3A), 330–339 (1985) 
9. Pracheil, B.M., DeRolph, C.R., Schramm, M.P., Bevelhimer, M.S.: A ?sh-eye view of 
riverine hydropower systems: the current understanding of the biological response to turbine 
passage. Rev. Fish Biol. Fish. 26(2), 153–167 (2016) 
10. Trumbo, B.A., Ahmann, M.L., Renholds, J.F., Brown, R.S., Colotelo, A.H., Deng, Z.D.: 
Improving hydroturbine pressures to enhance salmon passage survival and recovery. Rev. 
Fish Biol. Fish. 24(3), 955–965 (2014) 
11. Deng, Z.D., Lu, J., Myjak, M.J., Martinez, J.J., Tian, C., Morris, S.J., Carlson, T.J., Zhou, D., 
Hou, H.: Design and implementation of a new autonomous sensor ?sh to support advanced 
hydropower development. Rev. Sci. Instrum. 85(11), 115001 (2014) 
12. Hashizume, K., Rosado, D.G., Fernández-Medina, E., Fernandez, E.B.: An analysis of 
security issues for cloud computing. J. Internet Serv. Appl. 4, 5 (2013) 
13. Dawoud, W., Takouna, I., Meinel, C.: Infrastructure as a service security: challenges and 
solutions. In: The 7th International Conference on Informatics and Systems (INFOS), pp. 1– 
8. IEEE Computer Society (2010) 
14. Carlin, S., Curran, K.: Cloud computing security. Int. J. Ambient Comput. Intell. 3(1), 14– 
19 (2011) 
15. Catteddu, D.: Cloud computing: bene?ts, risks and recommendations for information 
security. In: Serrão, C., Aguilera Díaz, V., Cerullo, F. (eds.) Web Application Security. 
Communications in Computer and Information Science, vol. 72. Springer, Berlin (2010) 
16. Viega, J.: Cloud computing and the common man. Computer 42(8), 106–108 (2009) 
17. Rittinghouse, J.W., Ransome, J.F.: Cloud Computing: Implementation, Management, and 
Security. CRC Press, Boca Raton (2009). ISBN 9781439806807 
18. Violino, B.: The Dirty Dozen: 12 Top Cloud Security Threats for 2018. CSO Online (2018) 
19. Cloud Security Alliance: Top Threats to Cloud Computing. V1.0 (2010) 
20. Zio, M.D., Fursova, N., Gelsema, T., Gießing, S., Guarnera, U., Petrauskien, J., Kalben, L.Q., 
Scanu, M., Bosch, K.O., Loo, M., Walsdorfer, K.: Methodology for Data Validation 1.0. 
ESSnet ValiDat Foundation (2016) 
528 H. Hou et al.
21. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. 
Stat. 46(3), 175–185 (1992) 
22. Zhang, Z.: Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 4(11), 
218 (2016) 
23. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and 
dissimilarity measures in clustering continuous data. PLoS ONE 10(12), e0144059 (2015) 
24. Popovic, K., Hocenski, Z.: Cloud computing security issues and challenges. In: Proceedings 
of the 33rd International Convention MIPRO, pp. 344–349. IEEE Computer Society, 
Washington DC (2010) 
A Cloud-Based Decision Support System Framework 529
An Attempt to Forecast All Different Rainfall 
Series by Dynamic Programming Approach 
Swe Swe Aung1,3(&) , Shin Ohsawa2 , Itaru Nagayama3 , 
and Shiro Tamaki3 
1 
Department of Software, University of Computer Studies, Taunggyi, Myanmar 
2 
Weathernews Inc., Okinawa, Japan 
3 
Department of Information Engineering, 
University of the Ryukyus, Okinawa, Japan 
{sweswe,nagayama,shiro}@ie.u-ryukyu.ac.jp 
Abstract. Unexpected heavy rainfall has been seriously occurred in most parts 
of the world, especially during monsoon season. As a serious consequence of 
heavy rainfall, the people in those areas battered by heavy rainfall faced many 
hardship lives. Without exception, prevention is the best way of minimizing 
these negative effects. In spite of all, we developed a rainfall series prediction 
system for different series patterns by applying the dynamic programming 
approach aiming to acquire the rainfall level of the whole rainfall cycle. The 
simple idea behind the proposed dynamic programming approach is to ?nd the 
similarity of two rainfall sequences upon the maximum match of the rainfall 
level of those sequences. Based on 2011 and 2013 real data sets collected from 
WITH radar, which is installed on the rooftop of Information Engineering, 
University of the Ryukyus, the comparison between the conventional approach 
(Polynomial Regression) and the proposed approach is investigated. These 
correlation experiments con?rm that the dynamic programming approach is 
more ef?cient for predicting different rainfall series. 
Keywords: Dynamic programmingRainfall seriesPolynomial regression 
WITH radar 
1 Introduction 
Rainfall forecasting in meticulous practice plays an important role in predicting the 
severe natural disasters with a view to prevent the potential threats and damages. 
As reported by online news, heavy rainfall lashed Sierra Leone in Africa on 
August 14, 2017, and left the region with landslides and mudslides due to heavy 
flooding. On June 13, 2017, torrential rainfall hit Bangladesh and triggered deadly 
mudslides in that region. The same deadly damages caused by guerrilla rainfall occurred 
in Sri Lanka during the ?nal week of May 2017. On July 5, 2017, many people went 
missing in the massive landslides and floods from heavy rainfall that battered Fukuoka, 
Japan. On July 21, 2017, the heaviest rainfall hit lower Myanmar, and many people were 
temporarily displaced due to landslides and floods. Figure 1 shows the flood in the city 
of Nago, Okinawa, Japan, caused by heavy rain on July 9, 2014. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 530–547, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_40
The damage caused by guerrilla rainfall points out the importance of localized 
rainfall prediction with accurate estimations to prevent the after-effects. A quick change 
in rainfall is one of the most dif?cult factors in making a decision about a long-term 
prediction. 
The states between developing and decaying cumulonimbus clouds can alter very 
rapidly. Fortunately, the small-dish WITH aviation radar includes functions for 
observing and capturing rapidly developing cumulonimbus clouds in high resolution to 
deal with those dif?culties. 
For this purpose, a prediction model is designed by using the concept of dynamic 
programming algorithm. Dynamic programming approach is a powerful tool for 
solving the problem of investigating the similarity between two pairs of rainfall series. 
The similarity between two rainfall sequences is de?ned according to the maximum 
match number of rainfall levels. Direct comparison of two rainfall sequences is not 
completely an appropriate matching to compute the similarity and generate the rainfall 
level relationships between those two rainfalls. Thus, dynamic programming came to 
our attention as an approach that is a good ?t for predicting different rainfall series 
pattern. 
Another study for predicting the whole rainfall series is one of conventional curve 
?tting approaches (polynomial regression). The concept of polynomial regression is to 
generate a prediction model of independent variable x and dependent variable y cor-responding 
to the nonlinear relationship between them. In this paper, the two 
approaches primarily aim to investigate the most similar rainfall series for newcomers 
are presented. 
The rest of this paper is organized as follows. Section 2 describes related works. 
Section 3 discusses WITH radar and how to generate rainfall level. Section 4 describes 
the phenomenon of localized rainfall. Section 5 details the construction of rainfall level 
data model and Sect. 6 details with dynamic programming model for rainfall series 
prediction. Section 7 is about polynomial regression and Sect. 8 is analytical result and 
discussion. Section 9 is the conclusion. 
Fig. 1. Okinawa in Japan floods caused by heavy rain on July 9, 2014 [1]. 
An Attempt to Forecast All Different Rainfall Series 531
2 Related Works 
The level or amount of rainfall prediction system for short-term period has being 
implemented by many researchers in many parts of the countries by applying various 
prediction methodologies to different kinds of rainfall data resources. In this case, many 
powerful machine learning approaches have come to the attention of researchers for the 
short-term rainfall prediction systems. 
In [2], the authors proposed a system for prediction of rainfall using radar reflec-tivity 
data by applying ?ve machine learning approaches (neural network, random 
forest, classi?cation and regression tree, support vector machine, and k-nearest 
neighbor) in a watershed basin at Oxford. The purpose of the paper is to select one 
algorithm, which could predict the rainfall with the highest precision accuracy. As 
reported by the experimental results, arti?cial neural network MLP NN is the best 
performance in comparison to other algorithms. 
In [3], the authors designed a system for short-term rain forecasting system in the 
northeastern part of Thailand by applying machine learning techniques (decision tree 
(DT), arti?cial neural network (ANN) and support vector machine (SVM)). According 
to the comparative results, arti?cial neural network and support vector machine are 
more suitable for the prediction of short-term rainfall amount than decision tree. 
Aung et al. [4] proposed a short-term prediction of localized rainfall from radar 
images by applying dual-kNN approach aiming to forecast one-minute, three-minute, 
and ?ve-minute forecasts. They utilized dual-kNN approach in order to upgrade the 
ordinary classi?cation routines of classical k-nearest neighbors (k-NN) and to improve 
the prediction accuracy. They experimentally con?rmed with test cases and simulations 
that the performance of dual-kNN is more effective than classical k-NN. 
Inafuku et al. [5] designed a short-term prediction for guerrilla rainstorm by using 
state-transition method. For the short-term prediction, they introduced the rapid state-transition 
ones based on short-period sampling data to overcome the weakness of the 
classical state-transition method. Besides, they introduced the estimation method of the 
coordinates of center of gravity movement of rain cloud to get more precision forecast. 
In [6], the authors proposed approach for searching for similarities in the amino acid 
sequence of two proteins to determine whether signi?cant homology exist between the 
proteins by applying dynamic programming matching approach. 
The systems described above only emphasized on how to do the prediction of 
short-term rainfall prediction using various powerful machine learning approaches. In 
other words, it means that the system makes a forecast emphasizing on only one part of 
the rainfall series. Thus, this paper intends to predict the rainfall level of the whole 
rainfall series or rainfall circle by applying dynamic programming approach. Dynamic 
programming approach is a powerful approach for solving the problem of sequence 
decisions [7]. The underling idea is to ?nd the similarity of two sequence problems by 
applying alignment method. 
532 S. S. Aung et al.
3 WITH Radar and Rainfall Level 
The small-dish aircraft radar dubbed WITH radar, which is owned by Weathernews 
Inc., is Doppler radar for observing and capturing cumulonimbus clouds that can cause 
torrential rainstorms. It has the following features [8]. 
• The diameter of the radar is about 1000 mm. 
• It can capture the development processes of cumulonimbus clouds that cause 
guerrilla rainstorms. 
• It can observe altitudes of 2 km and below. 
• Observations use the Doppler method. 
• The frequency is 9340 MHz (X-band). 
• Electric power is 30 W. 
• Sampling time is six seconds. 
• The observable range is a 50 km radius. 
• Spatial resolution is a 150 m mesh. 
Figure 2 combines three pictures, where the leftmost is the WITH radar installed on 
the rooftop of the Information Engineering building. The middle picture is a cross-section 
scan from the WITH radar that shows a cumulonimbus cloud forming near 
Okinawa Island. The rightmost photo is the color scheme for rainfall levels 0 to 14. The 
quantity of rainfall is de?ned by the equation 2.67 
h 
Rain Level, corresponding to a 
quantity of precipitation from 00 mm/h to 40 mm/h. 
In Fig. 2, the middle image represents a sample image from observation of a rain 
cloud constructed by cross-section scan. In this image, the weather radar locates the 
area where the suspected rain cloud produces a heavy rainstorm. The intensity of 
rainfall levels is represented by 15 different colors (black, off-white, sky blue, light 
blue, blue, dark blue, dark green, green, light green, light yellow, yellow, yellow– 
orange, light pink, pink, and red) as shown in the rightmost section of Fig. 2. Beyond 
that, the rainfall level in digital format is from 0 to 14, where 0 is clear (i.e. not raining). 
Light rain is from rainfall level 1 to 5, moderate rainfall is from level 6 to 11, and heavy 
rain is from level 12 to 14. 
Fig. 2. Left to right: WITH radar; an observed localized cumulonimbus cloud near Okinawa; 
and the colors denoting the various rain levels. 
An Attempt to Forecast All Different Rainfall Series 533
Table 1 illustrates the intensity of each rainfall level in digital format. In this case, 
the intensity of each rainfall level is computed by applying the following (1): 
Intensity of Rainfall 
¼ 
ððRainfall mm=hrÞ= Level NumberÞ m 
Rainfall level 
ð1Þ 
4 Phenomenon of Localized Rainfall 
Usually, the development and decay of cumulonimbus clouds lasts from 30 min to 1 h. 
The phenomenon can occur over small islands, such as Okinawa. Figure 3 demon-strates 
12 rainfall series. Extensively, Y axis represents the rainfall level in inches, and 
X axis denotes minutes. 
Figures 3 and 4 illustrate the phenomenon of torrential rainfall based on 2011 and 
2013 weather data. In Fig. 3, it can be clearly seen that the red dots represent growth 
and blue dots represent decay. It is obvious that higher rainfall levels cover a larger 
rainfall area. The two go hand-in-hand. Figure 4 illustrates a series of torrential rainfall 
levels based on a time increment that lasted around 30 min. In this ?gure, the X axis is 
time, and the Y axis is rainfall level. 
Figure 5 demonstrates the development and decay conditions in torrential rainfall. 
It usually starts at a small size and becomes bigger. Finally, it slowly starts to decay. 
Actually, a rainfall cycle that usually lasts 30 min includes around 300 images, because 
the radar takes one picture every six seconds. From one rainfall cycle, we only used 
Table 1. Intensity of rainfall levels 
Rainfall level Intensity of rainfall level 
(2.66 * rainfall level) 
0 0 
1 2.66 
2 5.32 
3 7.98 
4 10.64 
5 13.3 
6 15.96 
7 18.62 
8 21.28 
9 23.94 
10 26.6 
11 29.26 
12 31.92 
13 34.58 
14 37.24 
534 S. S. Aung et al.
Fig. 3. Twelve Rainfall Series based on 2011 and 2013 Rainfall Data. 
Fig. 4. Rainfall levels for torrential rain lasting about 30 min. 
Fig. 5. Radar images of rainfall lasting about 30 min. 
An Attempt to Forecast All Different Rainfall Series 535
some of the more important images to illustrate the characteristics of the rainfall cycle 
in Fig. 5. 
5 Rainfall Level Data Model Construction 
Before going into the detailed discussion of dual-kNN, we want to discuss how to 
create a rainfall level data model for rainfall prediction. 
Figure 6 illustrates the rainfall level of each pixel, P (ri) (x, y), extracted from radar 
images where {P(ri) (x, y): i = 0, 1, 2, …,14} denotes the rainfall level at coordinates 
(x, y), and i 
2 
{0, 1, 2, 3,…, 14}. Then, pixel values P (r0) (x, y), P (r1) (x, y),…, P (r14) 
(x, y) represent each rainfall level in a single image. An image may contain different 
rainfall levels corresponding to the current captured image and weather conditions. 
After generating the pixel values (rainfall level), the intensity of each rainfall level is 
manipulated again by applying (1). We create a data model, as shown in Table 2, for 
the rainfall prediction system, which includes 15 features (R_Level0, R_Level1,…, 
R_Level14) belonging to 15 different class types (R_ Intensity). In this case, a radar 
image contains many pixels that denote different rainfall levels. Therefore, in the data 
model, each instance represents only one image. Thus, one image is a combination of 
15 different aspects of the instance, from R_Level0 to R_Level14. In detail, R_Level0 
indicates rainfall level 0, and its value is the total number of instances of rainfall level 
0. R_Level1 also denotes all the instances of rainfall level1 extracted from the same 
image, and so on. Now, we have created the simplest data model for rainfall prediction, 
as shown in Table 2. 
6 Dynamic Programming Matching for Rainfall Series 
Prediction 
A dynamic programming method, which is originated, by Needleman and Wunsch 
(1970) becomes very useful and powerful approach in a variety of appliances in the 
?eld of computer science. A simple strategy underlining the dynamic programming is 
Fig. 6. Rainfall level of each pixel extracted from radar images. 
536 S. S. Aung et al.
to investigate the similarity between two sequences corresponding to the maximum 
match in a certain path. 
For rainfall series prediction system, let us consider two sample rainfall series, 
Rainfall_S1, and Raingfall_S2, as shown in Figs. 7 and 8. In the two sample rainfall 
series, each has ?ve images and each image represents different rainfall levels. In our 
discussion, we often used node, which also represents the image of rainfall series. 
For the two rainfall series, the similarity can be mathematically denoted as follows: 
Similarity RainfallSi; RainfallSj 
s s 
¼ 
Score Optimal Alignment of RainfallSi and RainfallSj 
n n 
ð2Þ 
As stated in (2), the similarity of two rainfall sequences is de?ned as the best or 
optimal part that has the highest alignment scores among all alignments of two rainfall 
Table 2. Rainfall level data model 
R_Level0 R_Level1 .. . R_Level14 
Fig. 7. Sample Rainfall Series named Rainfall_S1. 
Fig. 8. Sample Rainfall Series named Rainfall_S2. 
Fig. 9. A sample graph for two rainfall sequences. 
An Attempt to Forecast All Different Rainfall Series 537
sequences. The best optimal path represents the predicted rainfall series for a new 
comer series. 
Figure 9 illustrates a construction of directed graph G = (V, E) consisting of a set 
of notes (V) connected by edges (E) to perform a Needleman-Wunsch alignment for 
two rainfall series. Each node owns two properties, one is the pointer to the corre-sponding 
node that gives optimal sub-alignment and the second one is alignment score. 
As a ?rst step, to ?nd out the best alignment for each note, it needs to consider the 
similarity of three corresponding subsequences (Score1, Score2 and Score3). Secore1 is 
the addition of the best score of note, Note [i, j -, 1] and ScoreðgapÞ. Secore2 is 
computed by adding the score of the note, Note [i -h 1, j -, 1] and ScoreðgapÞ. Like-wise, 
Score3 is the addition of the value of the node, Note [i -h 1, j] and ScoreðmatchÞ. 
Then, the alignment that belongs to the highest score is selected as the besignment for 
the current note. For ?nding the best alignment score, the following equations are 
given. 
Score1 
¼ Scoreðsubalignmet1Þ þ ScoreðgapÞ ð3Þ 
Score2 
¼ Scoreðsubalignment2Þ þ ScoreðgapÞ ð4Þ 
Score3 
¼ Scoreðsubalignment3Þ þ ScoreðmatchÞ ð5Þ 
Where, 
Score (gap) = -2, Score (matched pair) = Similarity (Image[i], Image[j]) and 
Score (mismatch pair) = -1. 
In details, Score (gap) means that there is no value to match for two nodes, Score 
(mismatch pair) is that two nodes have their own value, but these two values are not the 
same, and for Score (matched pair), the value of two nodes are the same. In the rainfall 
series prediction system, we de?ne the threshold (Similarity (Image [i], Image[j]) > 90) 
for ?nding the similarity of two images. As we discussed in the previous section, the 
rainfall image is the combination of 15 rainfall levels (R Level0; R Level1; R Level2, 
…, R Level14). Thus, the similarity between two rainfall level images is de?ned in 
terms of average distance of 15 rainfall levels as denoted in Eq. (6). If the similarity is 
greater than 90%, we assume that the two images are identical or matched pair. 
The average similarity between two rainfall images can be de?ned in percentage by 
the following equation: 
SimilarityðImage½i; 
Image½ jÞ ¼ 
1 
15 
Xn\15 
m¼0 
1 
¼ 
Absolute:Val Imagei:Level½m  
Imagej:Level½m 
  
 
l  
Imagei:LevelðmÞ þ 
Imagej:LevelðmÞ 
  
ð6Þ 
Where, n is the number of rainfall levels and Absolute.Val (Imagei·Level[m], 
Imagej· Level[m]) is the distance between Level[m] of Imagei and Imagej. Then, the 
distance divided by the addition of Level[m] of Imagei and Imagej generates the dis-tance 
between the two images in percentage. Consequently, the similarity is evaluated 
by subtracting the distance from 1. As a ?nal result, the average similarity between 
538 S. S. Aung et al.
Imagei and Imagej is summing the similarity up to 14 rainfall levels and dividing the 
additional result by the number of rainfall levels. 
If the similarity is greater than 90%, then DP matching algorithm will take this two 
images into account in the matching process. Otherwise, it will refuses to consider them 
in creating the optimal sub-alignment. In details, the following step-by-step procedure 
illustrates how to evaluate the similarity between two images (Imagei, Imagej). 
Before going to the section of ?nding the best path, let us ?rst observe sub-alignment 
score of each note. 
As shown in Fig. 9, let us consider the process of Node [i = 3, j = 3] in red color. It 
has to de?ne the best alignment score for the current note by selecting the highest score 
from three sub alignments (sub-alignment1, subalignment2 and subalignment3) of 
immediate predecessors in creating the best path. Where, sub-alignment1 comes from 
Note [i, j -, 1], sub-alignment2 is from Note [i -s 1, j -, 1], and sub-alignment3 is from 
Note [i -s 1, j]. 
Then, Score1, Score2, and Score3 can be denoted by the following equations: 
Score1 
¼ Note½i; j 
; 1  
2 
if 0 
f 
i 
f 
Series1:Length and 
0 
n 
j 
n 
Series2:Length 
ð7Þ 
Score2 
¼ Note½i 
7 
1; j 
; 1 þ SimilarityðImage½i; Image½ jÞ 
if 0 
f 
i 
f 
Series1:Length and 
0 
n 
j 
n 
Series2:Length 
ð8Þ 
Score3 
¼ Note½i 
8 
1; j  
2 
if 0 
f 
i 
f 
Series1:Length and 
0 
n 
j 
n 
Series2:Length 
ð9Þ 
After that, the best alignment for Note [i, j] can be chosen by a given equation: 
An Attempt to Forecast All Different Rainfall Series 539
ScoreðAlignment½i; jÞ ¼ 
max 
Score1 
Score2 
Score3 
8 
< 
: 
ð10Þ 
Then, now it is ready to ?nd the best path for two rainfall series. The optimal path 
can be de?ned by using the scores backtrack through the nodes with the optimal 
sub-alignment as shown in Fig. 11. The ?nal result is the optimal path that is the most 
similar to a new comer rainfall series. Figure 10 illustrates a sample best path for 
rainfall series forecast in visualization view. 
Optimal Path ðRainfall Si; Rainfall Sj 
¼ 
Xj1 
i¼1 
ScoreðNotei 
! 
Notei þ 1Þ ð11Þ 
Figure 11 demonstrates the best optimal path for rainfall series, Rainfall_S1 and 
Rainfall_S2 by using backtrack through the nodes which owns the highest alignment 
score until the last node. The most possible predecessor is the diagonal match. The DP 
algorithm performs alignments with a time complexity of O (ij). 
Fig. 10. Sample best optimal path for rainfall series forecast. 
Fig. 11. Reconstructing the optimal path using backtrack through the nodes with best alignment 
score. 
540 S. S. Aung et al.
Algorithm 1 illustrates the detail process of dynamic programming matching for 
creating the best rainfall series path between two rainfall sequences. 
An Attempt to Forecast All Different Rainfall Series 541
7 Polynomial Regression for Rainfall Series Prediction 
Polynomial regression is a model of nonlinear regression approach, which is useful to 
?nd the characteristic of nonlinear relationship between the independent variable x and 
the dependent variable y. The polynomial regression is a popular approach for varieties 
of application areas, for example business and economic, weather and traf?c prediction 
systems [9]. 
For rainfall series prediction system, it has two properties (time and rainfall level) 
for each series. Here, we have a list of n rainfall series, S 
¼ 
fs1; s2; s3; :::; sig; where 
i 
¼ 
f1; 2; 3; :::; ng. Each rainfall series has different rainfall series length characterized 
by the following equation: 
si 
¼ 
ðht1; r1i; ht2; r2i; ht3; r3i; ...; htk; rkiÞ ð12Þ 
Where k 
¼ 
f1; 2; 3; :::; ng; tk represents time series and rk is rainfall level. To 
estimate different rainfall series, we generate different regression models for different 
rainfall series as a model bank. For each new comer, xi, the prediction process is taken 
through the R_Model bank. After that, the error estimation of each rainfall model is 
computed using least square error approach. As a ?nal step, the system made a decision 
of the best ?t rainfall series according to the information of error estimation model. 
Figure 12 illustrates the bock diagram for the detail process of rainfall series prediction 
system. 
The predicted value for the rainfall series using jth degree polynomial regression 
model can be written as 
f ðxÞ ¼ a0þ a1x þ a2x2 
þ ::: þ ajxj 
ð13Þ 
Where j represents the degree of polynomial regression, aj are the regression 
coef?cients. 
The general least square error is given by 
er 
¼ 
Xn 
i¼1 
yi a0þ a1x þ a2x2 
þ ::: þ ajxj 
: : 
2 
ð14Þ 
Fig. 12. Rainfall series prediction model. 
542 S. S. Aung et al.
Where, yi is the actual value, and Er is the least square error. For rainfall series 
system, a set of least square error Er 
c 
can be written as 
Er 
c 
¼ 
ðer1; er2; :::; ernÞ: 
ð15Þ 
The best ?t line can be de?ned by choosing the minimized error from 
the set of least square error Er 
c : 
The best line 
¼ 
select minimize error Er 
c 
r r 
ð16Þ 
8 Experiment and Analysis Discussion 
In this section, we will discuss the experimentation of the rainfall series prediction 
system and the results that prove the ef?ciency of the new approach, dynamic pro-gramming 
by comparing with polynomial regression approach. For those results, the 
prediction accuracy is computed using a measurement of how close the actual value of 
observed rainfall series to the value of forecasted rainfall series. As a ?rst step, we 
evaluate the forecast error by applying the following equation: 
ErrorðRainfallSiÞ ¼
Absolute Value of fðAcutal 
ValueðRainfallSiÞ 
f 
Forecast 
ValueðRainfallSiÞÞg 
ð17Þ 
ErrorðRainfallSiÞ% 
¼ 
Absolute Value of fðActual 
ValueðRainfallSiÞ f 
Forecast 
ValueðRainfallSiÞÞg 
Actual 
ValueðRainfallSiÞ 
ð18Þ 
Then, the accuracy of rainfall series is evaluated by the follow given equation. 
Accuracyð%Þ ¼ 
1 
h 
Errorð%Þ ð19Þ 
In this case, if accuracy is larger than 100, then the accuracy is 0%. For this 
experimentation, rainfall-level history data was provided by Weathernews Inc. Table 3 
describes the data sizes of the years (2011 and 2013) from two aspects: the original 
size, and the size for the preprocessing stage, which includes noise ?ltering, and 
converting images into a numerical format. 
Table 3. Weather data size descriptions 
Year Original data size Data size in the preprocessing stage 
2011 3 GB 4 MB 
2013 8.006 GB 10.9 MB 
An Attempt to Forecast All Different Rainfall Series 543
Table 4 describes the number of rainfall level images are included in each rainfall 
series, and the amount of processing time required for each different rainfall series with 
different processing time. As reported by Table 4, the more the rainfall series owns 
images, the more the processing time they need. For all rainfall series, the average 
processing time is 12338 ms. 
Table 5 illustrates the prediction accuracy of different rainfall series pattern using 
full-cross validation. This table has six columns. The ?rst column is the name of 
rainfall series. The second column is actual data. In more detail, the actual data is the 
sum of rainfall levels of all images for one rainfall series. The third column describes 
the forecasted rainfall level values. This value is also the summing up of rainfall levels 
of all images of forecasted rainfall series. All rainfall series, except Series 3, Series 7, 
and Series 10, achieve acceptable accuracy. For Series 3, Series 7 and Series 10, the 
rainfall series stored in databank are different rainfall level value. Thus, the algorithm is 
not able to retrieve 90% or greater than 90% similar series pattern. If we train the 
algorithm with plenty of case-bank, the better accuracy the algorithm will gain. 
However, the average forecast accuracy, 57%, con?rms that the system is suitable to 
apply to different rainfall series prediction system. 
Table 6 gives the prediction accuracy of rainfall series without using Full-Cross 
Validation approach. To put it another way, each rainfall series does not ignore itself in 
?nding the most similar rainfall series in case-bank. That is to say, the case-bank 
includes the most similar rainfall series to each series. Thus, in this experiment, each 
rainfall series achieves high prediction accuracy with average accuracy, 85%. 
Tables 7 and 8 demonstrate a second approach to the prediction accuracy of rainfall 
series, polynomial regression algorithm, that ?nds a nonlinear relationship between 
time (tkÞ 
nd rainfall level (rk). In Table 7, it states the prediction accuracy without using 
full-cross validation. In this study, the algorithm fails to reach acceptable accuracy in 
Series 2, 5, 7 and 8. Moreover, the estimation accuracy using full-cross validation can 
Table 4. Number of images description included in each series 
Rainfall series Number of images Amount time for prediction (millisecond) 
Series 1 632 11976 
Series 2 317 6227 
Series 3 1013 24649 
Series 4 875 20567 
Series 5 285 5423 
Series 6 571 10648 
Series 7 1086 28737 
Series 8 292 5700 
Series 9 857 18614 
Series 10 128 4135 
Series 11 288 5920 
Series 12 244 5467 
Total = 6588 Average = 12338.58333 
544 S. S. Aung et al.
be seen in Table 8. In this case, Series 2, 5, 7, 8, and 10 could not be performed well. 
Therefore, the total average accuracy of polynomial regression approach is 67% 
without full-cross validation and 54% with full-cross validation approach. 
Table 5. Accuracy of rainfall series using full-cross validation 
Rainfall series name Actual Forecast Error Error (%) Accuracy (%) 
Series 1 67800 44023 23777 35.06932 64.93067847 
Series 2 14899 13258 1641 11.01416 88.98583798 
Series 3 77502 500829 423327 546.2143 0% 
Series 4 86778 84935 1843 2.12381 97.87618982 
Series 5 67726 34550 33176 48.98562 51.01438148 
Series 6 109866 158971 49105 44.69536 55.30464384 
Series 7 453628 83260 370368 81.64575 18.35424621 
Series 8 23548 16228 7320 31.08544 68.9145575 
Series 9 81869 95428 13559 16.56182 83.43817562 
Series 10 3081 586 2495 80.9802 19.01979877 
Series 11 7790 10945 3155 28.82595 71.17405208 
Series 12 5787 3931 1856 32.07189 67.92811474 
Average accuracy 57.24505637 
Table 6. Accuracy of dynamic programming without using full-cross validation 
Rainfall series name Actual Forecast Abs (error) Error (%) Accuracy (%) 
Series 1 67800 67326 474 0.00699115 99 
Series 2 14899 11926 2973 19.95436 80 
Series 3 77502 86357 8855 11.42551 89 
Series 4 86778 100456 13678 15.76206 84 
Series 5 67726 64188 3538 5.223991 95 
Series 6 109866 105300 4566 4.155972 96 
Series 7 453628 528509 74881 16.50714 83 
Series 8 23846 21845 2001 8.391344 92 
Series 9 81869 91250 9381 11.45855 89 
Series 10 3081 1689 1392 45.18014 55 
Series 11 7790 6635 1155 14.8267 85 
Series 12 5787 4161 1626 28.09746 72 
Average accuracy 85% 
An Attempt to Forecast All Different Rainfall Series 545
As reported by comparative study of dynamic programming and polynomial 
regression for rainfall series forecast, dynamic programming approach is more suitable 
prediction approach for the whole rainfall series than polynomial regression approach. 
9 Conclusion 
In this study, we proposed a new predictive approach, dynamic programming algorithm 
aiming to forecast the different rainfall series pattern for the whole rainfall life cycle, 
not for each stage of rainfall series. As we know, the dynamic programming algorithm 
Table 7. Accuracy of polynomial regression without using full-cross validation 
Rainfall series name Actual Forecast Error Abs (error) Error (%) Accuracy (%) 
Series 1 67800 86778 -18978 18978 27.99115 72% 
Series 2 14899 5787 9112 9112 61.15847 39% 
Series 3 77502 86778 9276 9276 11.96872 88% 
Series 4 86778 86778 0 0 0 100% 
Series 5 67726 5787 61939 61939 91.45528 9% 
Series 6 109866 86778 23088 23088 21.01469 79% 
Series 7 453628 86778 -366850 366850 80.87023 19% 
Series 8 23846 5787 18059 18059 75.73178 24% 
Series 9 81869 86778 4909 4909 5.996165 94% 
Series 10 3081 3081 0 0 0 100% 
Series 11 7790 5787 2003 2003 25.71245 74% 
Series 12 5787 5787 0 0 0 100% 
Average accuracy 67% 
Table 8. Accuracy of polynomial regression using full-cross validation 
Actual Forecast Error Abs€ Error (%) Accuracy (%) 
Series 1 67800 86778 -18978 18978 27.99115 72 
Series 2 14899 5787 9112 9112 61.15847 39 
Series 3 77502 86778 9276 9276 11.96872 88 
Series 4 86778 77502 -9276 9276 10.68935 89 
Series 5 67726 5787 61939 61939 91.45528 9 
Series 6 109866 86778 23088 23088 21.01469 79 
Series 7 453628 86778 -366850 366850 80.87023 19 
Series 8 23846 5787 18059 18059 75.73178 24 
Series 9 81869 86778 4909 4909 5.996165 94 
Series 10 3081 5787 -2706 2706 87.82863 12 
Series 11 7790 5787 2003 2003 25.71245 74 
Series 12 5787 3081 2706 2706 46.75998 53 
Average accuracy 54 
546 S. S. Aung et al.
is a powerful approach for solving the time series problems. The approach is also 
popular for DNA and the amino acid sequence of two proteins. Actually, DP matching 
also covers almost research areas. To that end, for the rainfall series problem, the DP 
matching came to our attention as an approach to predict the different rainfall cycles. 
Furthermore, we also apply polynomial regression approach to rainfall series estima-tion 
to demonstrate and prove that dynamic programming is more ef?cient. In agree-ment 
with the experiment results as stated in Tables 5, 6, 7 and 8, DP matching 
achieved a higher prediction accuracy than conventional approach, polynomial 
regression. Supposing this research is in progress contending to forecast all different 
rainfall series, only a prediction have been executed over 2011 and 2013 datasets that 
are obtainable at this moment. For our future works, we will collect more rainfall series 
from different years and then apply DP matching algorithm using massive case-banks 
for proving that the ef?cient of algorithm with stronger con?rmation for different 
rainfall level pattern prediction. 
References 
1. Gilbeaux, K.: Global resilience system, Typhoon Neoguri—Flooding in Nago, Okinawa, 
Wed, 2014-07-09. https://resiliencesystem.org/typhoon-neoguri-?ooding-nago-okinawa 
2. Kusiak, A., Wei, X., Verma, A.P., Roz, E.: Modeling and prediction of rainfall using radar 
reflectivity data: a data-mining approach. IEEE Trans. Geosci. Remote Sens. 51(4), 2337– 
2342 (2013) 
3. Ingsrisawang, L., Ingsriswang, S., Somchit, S., Aungsuratana, P., Khantiyanan, W.: Machine 
learning techniques for short-term rain forecasting system in the northeastern part of Thailand. 
In: World Academy of Science, Engineering and Technology, vol. 2, no. 5 (2008). 
International Journal of Computer and Information Engineering 
4. Aaung, S.S., Senaha, Y., Ohsawa, S., Nagayama, I., Tamaki, S.: Short-term prediction of 
localized heavy rain from radar imaging and machine learning. IEIE Trans. Smart Process. 
Comput. 7, 107–115 (2018) 
5. Inafuku, S., Tamaki, S., Hirata, T., Ohsawa, S.: Guerrilla rainstorm prediction of using a state 
transition. In: Proceedings of Japan Wind Energy Symposium, vol. 35, pp. 375–378 (2016) 
6. Needleman, S.B., Wunsch, C.D.: A general method application to the search for similarities in 
the amino acid of two proteins. J. Mol. Biol. 48(3), 443–453 (1970) 
7. Brown, K.Q.: Dynamic Programming in Computer Science. Department of Computer 
Science, Carnegie-Mel Ion University, Pittsburgh (1979) 
8. Kusabiraki, C.: Weathernews Inc, June 11, 1986. https://global.weathernews.com/ 
infrastructure/with-radar/ 
9. Ostertagov, E.: Modelling using polynomial regression. Proc. Eng. 48, 500–506 (2012) 
An Attempt to Forecast All Different Rainfall Series 547
Non-subsampled Complex Wavelet Transform 
Based Medical Image Fusion 
Sanjay N. Talbar1 , Satishkumar S. Chavan2(?) , and Abhijit Pawar3 
1 
SGGS Institute of Engineering and Technology, Nanded 431606, MS, India 
sntalbar@yahoo.com 
2 
Don Bosco Institute of Technology, Kurla (W), Mumbai 400070, MS, India 
satyachavan@yahoo.co.in 
3 
SKN Medical College and General Hospital, Narhe, Pune 411041, MS, India 
abhijitpawar.rad@gmail.com 
Abstract. The paper presents a feature based medical image fusion approach for 
CT and MRI images. The directional features are extracted from co-registered 
CT and MRI slices using Non-Subsampled Dual Tree Complex Wavelet Trans- 
form (NS DT-CxWT). These features are combined using average and maxima 
fusion rules to create composite spectral plane. The new visually enriched image 
is reconstructed from this composite spectral plane by applying inverse transfor- 
mation. Such fused images are evaluated for its visual quality using subjective 
and objective performance metrics. The quality of fused image is rated by three 
radiologists in subjective evaluation whereas edge and similarity based fusion 
parameters are computed to estimate the quality of fused image objectively. The 
proposed algorithm is compared with the state of the art wavelet transforms. It 
provides visually enriched fused images retaining soft tissue texture of MRI along 
with bone and lesion outline from CT with better contrast for lesion visualization 
and treatment planning. It is also found that the average score by radiologists is 
‘3.85’ for proposed algorithm which is much higher than that of the average score 
for other wavelet algorithms. 
Keywords: Medical image fusion · Non-subsampled complex wavelet transform 
Dual Tree Complex Wavelet Transform · Discrete Wavelet Transform 
Radiotherapy · Fusion parameters 
1 Introduction 
Medical imaging is extensively used in disease diagnosis and treatment since last two 
decades. Major imaging modalities are Ultrasound Guided Imaging (USG), Computed 
Tomography (CT), and Magnetic Resonance Imaging (MRI) along with functional MRI 
(fMRI), Positron Emission Tomography (PET), and Single-Photon Emission Computed 
Tomography (SPECT). Every modality imaging has its own advantages and disadvan- 
tages like CT captures calci?cations, implants, and bone structures prominently whereas 
MRI provides better visualization of soft tissues and lesions [1]. No single modality 
provides all relevant clinical information together. Therefore, there is a need to develop 
techniques which will bring important clinical information of two or more modalities 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 548–556, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_41
in a single frame. Such techniques which aid the radiologists in disease diagnosis and 
treatment planning are called multimodality medical image fusion. The acquisition 
process of these modalities is also completely di?erent which makes them complemen- 
tary modalities for the fusion. 
Medical image fusion has signi?cant role in treatment of cancer using radiation 
therapy. The treatment uses CT as main modality whereas MRI is preferred as a comple- 
mentary modality. The delineation of infected cells or tissues is obtained using both CT 
& MRI and planning of radiation procedure is done using CT. Obviously, it is a great 
help to medical physicist to have both CT and MRI information together in a single 
frame for delineation. This will help radiation oncologist to prepare precise treatment 
plan for treating the cancer patients in a best possible way. 
In fusion system, source modalities can be varied over large number of acquisition 
processes. The source modality images have complementary structural representations. 
Many techniques and algorithms were proposed in the literature for the fusion [2]. Two 
major categories of fusion techniques are spatial domain and frequency domain techni- 
ques. Fusion process is also broadly divided into point wise fusion, feature based fusion, 
and parametric mapping of decision fusion. Point wise fusion is simpler and combines 
information point to point, feature level fusion extracts and merges features, and decision 
level fusion selects and maps the relevant information for creating new image. 
As per literature, pyramid and wavelet based Multiresolution Analysis (MRA) 
approaches are extensively used for medical image fusion [3]. However, wavelet based 
methods showed superior results as wavelets decompose the source images into 
frequency sub-bands which give an edge over the pyramid transforms. Discrete Wavelet 
Transform (DWT) provides spatio-spectral localization, better directional sensitivity 
with good signal-to-noise ratio. It is preferred transform by many researchers for medical 
image fusion [4–7]. However, fused images may have distortions and visual inconsis- 
tencies due to demerits of DWT like limited directional selectivity, oscillations, no phase 
information, etc. Recently, complex wavelet transform is also preferred over DWT. Dual 
Tree Complex Wavelet Transform (DT-CxWT), Daubechies Complex Wavelet Trans- 
form (DCxWT) and M-Band Wavelet Transform (MBWT) have used for fusion process 
due to their directional sensitivity and phase information [8–10]. Edge based techniques 
like contourlet transform [11], curvelet transform [12], shearlet transform [13], ripplet 
transform [14] have also gained much attention in medical image fusion. Redundancy 
Discrete Wavelet Transform (RDWT) also performs better due to its shift invariance 
property [15]. Soft computing approaches like arti?cial neural network, fuzzy logic, 
neuro-fuzzy, etc. are also preferred for medical image fusion [16]. However, retaining 
visual content in fused images is still a challenge which requires development of new 
fusion schemes. 
In this paper, new fusion scheme is proposed which uses Non-Subsampled Dual Tree 
Complex Wavelet Transform (NS DT-CxWT) to extract directional features from source 
CT and MRI images. These features in spectral space are combined using fusion rules 
like averaging of low frequency coe?cients and selection of maximum valued high 
frequency coe?cients. The proposed fusion scheme is described in Sect. 2 along with 
conceptual background of NS DT-CxWT and fusion rules. The experimental results and 
Non-subsampled Complex Wavelet Transform 549
analysis of fused images using subjective and objective evaluation metrics are presented 
in Sect. 3 which is followed by conclusion and future scope in Sect. 4. 
2 Proposed Fusion Scheme 
The medical image fusion is a process of merging the relevant and complementary clin- 
ical information into new visually enriched fused image [5]. Figure 1 shows the proposed 
fusion scheme in which the directional spectral features are extracted using NS DT-
CxWT. The source images are co-registered CT and MRI slices of same anatomical 
structure of the same patient. The selection of appropriate frames from the source 
modalities are done by radiologists. These selected frames of CT and MRI are registered 
for pixel alignment using geometric transformations like scaling, translation and rota- 
tion. The e?ectiveness of fusion process depends on the registration process. 
Fig. 1. Proposed medical image fusion scheme. 
The directional features of CT and MRI are combined using fusion rules resulting 
new spectral plane. The inverse NS DT-CxWT is applied to reconstruct the fused image 
from this new feature plane. The fused images are tested for their visual quality subjec- 
tively with the help of radiologists. The fusion parameters are also calculated to evaluate 
the fused images for their visual quality and preservation of anatomical structures from 
the source images. The novelty of this paper is the feature extraction using NS DT-
CxWT and fusion rules which are discussed in the following subsections. 
2.1 Discrete Wavelet Transform 
Discrete Wavelet Transform (DWT) is widely used technique for subband decomposi- 
tion of images. It converts image into four subbands at ?rst level of decomposition i.e. 
approximate (A1), horizontal (H1), vertical (V1), and diagonal (D1) subbands as shown 
in Fig. 2(a). A1 provides textural information and other subbands give three discontinu- 
ities as (0°, 90°, and ±45°) as shown in Fig. 2(b). However, DWT represents combined 
features in +45° and -45° orientations. It also su?ers due to less directionality, aliasing, 
oscillations at discontinuities, and shift variance [17]. 
550 S. N. Talbar et al.
Fig. 2. Discrete wavelet transform (a) First level decomposition (b) Corresponding fourier 
representation provides information as A1: textural, H1: 0°, V1: 90°, D1: ±45°. 
2.2 Non-subsampled Dual Tree Complex Wavelet Transform 
Dual Tree Complex Wavelet Transform (DT-CxWT) is designed using real coe?cients 
in two tree structures resulting a complex nature. Real and imaginary parts of DT-CxWT 
are used in Tree ‘a’ and Tree ‘b’, respectively. The complex representation of DT-CxWT 
is given in the form of ‘a + jb’. DT-CxWT is nearly shift invariant, provides phase 
information, and exhibits high directional selectivity [17]. Figure 3 shows three levels 
of decomposition of NS DT-CxWT. Here, h0[n] & h1[n] are low pass ?lter coe?cients 
and g0[n] & g1[n] are high pass ?lter coe?cients in tree ‘a’ and ‘b’, respectively. After 
?ltering using low pass and high pass ?lters, conventional down sampling operation is 
eliminated in every level to make DT-CxWT as Non-Subsampled DT-CxWT. 
Fig. 3. Three levels of decomposition by NS DTCxWT used in proposed medical image fusion 
scheme. 
NS DT-CxWT has six wavelets that are computed using (1) and (2). Here, 
??a 
i 
(m, n) 
and 
??b 
i+3 
(m, n), 
i = 
1, 2,3 are ?lter coe?cients which provides feature representations 
oriented in six directions as (±15°, ±45°, ±75°) after decomposition [17]. Thus, NS DT-
CxWT has an edge over the other transforms in terms of high directional selectivity. 
The spectral directional representation for two levels of decomposition with six orien- 
tations is shown in Fig. 4. The non-subsampling avoids the loss of information. 
Non-subsampled Complex Wavelet Transform 551
??a 
i 
(m, n) = 
v) 
2 
1 
( 
??1,i(
m, n) -) 
??2,i(
m, n) 
) 
(1) 
??b 
i+3 
(m, n) = 
v) 
2 
1 
( 
??1,i(
m, n) + 
??2,i(
m, n) 
) 
(2) 
Fig. 4. Fourier spectrum of NS DT-CxWT representing six distinct orientations. 
The merits of the proposed fusion scheme using NS DT-CxWT are the directional 
selectivity, phase information, shift invariance, and redundant content with same compu- 
tational complexity as DT-CxWT. It also supports in the selection of appropriate features 
to create composite spectral space. 
2.3 Fusion Rules 
The source CT and MRI images are decomposed into three levels using separable NS 
DT-CxWT. It results into two low frequency subbands and six high frequency subbands. 
Low frequency subband coe?cients are averaged and maximum valued high frequency 
coe?cient is selected using (3) to create composite spectral space. Here, CP is composite 
plane, t stands for tree ‘a’ or ‘b’, and K represents a particular subband (A, V, H, D). 
The inverse NS DT-CxWT is applied on this composite plane to reconstruct fused image. 
CPK 
t 
(u, v) = 
?) 
?) 
?) 
?) 
?) 
??CTK 
t 
(u, v) + (1 
-) ??)MRIK 
t 
;?? = 
0.5 
CTK 
t 
(u, v) 
;CTK 
t 
(u, v) > MRIK 
t 
(u, v) 
MRIK 
t 
(u, v) 
;MRIK 
t 
(u, v) =) CTK 
t 
(u, v) 
(3) 
3 Experimental Results and Discussion 
The proposed fusion scheme is tested for its performance on the database of 29 study 
sets of CT and MRI of same patient. Eighteen sets are captured using Simens CT scan 
- Somatom Spirit scanner and Siemens 1.5 T MRI - Magnetom C1 machine, respectively 
and 11 study sets are taken from website ‘https://radiopaedia.org/’. The radiologists 
552 S. N. Talbar et al.
selected slices based on anatomical markers. It is then followed by geometric transfor- 
mation to register them for pixel/voxel alignment. Sample study sets of CT and MRI are 
presented in Figs. 5(a–c) and (d–f), respectively. A personal computer having Intel 
processor i5 (2.50 GHz) and 4 GB RAM is used for all the computations in 
MATLAB2013a. 
Fig. 5. Fusion results: (a, b, c) CT images from Set 1, Set 2, and Set 3, (d, e, f) MRI images from 
Set 1, Set 2, and Set 3. Fused images of Set 1 (First Row), Set 2 (Second Row), Set 3 (Third Row) 
using (a1, a2, a3) DWT, (b1, b2, b3) SWT, (c1, c2, c3) NSCT, (d1, d2, d3) DT-CxWT, (e1, e2, 
e3) proposed. 
The fusion metrics viz. Entropy (En), Fusion Factor (FusFac), mean Structural 
Similarity Index Measure (mSSIM), and Edge Quality Measure (EQ) [7] are calculated 
for objective quality assessment. En provides energy representation of an image and 
FusFac is a parameter based on mutual information computed using original images 
and the fused image. The e?ective means of preserving edges are de?ned using EQ 
whereas mSSIM is an index for similarity between source images and fused image. En 
& FusFac should have higher values and EQ & mSSIM should have value approaching 
towards ‘one’ for considering the fused image as a good quality image. 
The proposed algorithm is compared with Discrete Wavelet Transform (DWT), 
Stationary Wavelet Transform (SWT), Nonsubsampled Contourlet Transform (NSCT), 
and DT-CxWT for its performance. The comparative objective evaluation of fusion 
parameters for three study sets are presented in Table 1. It shows that En and FusFac 
are higher for the proposed algorithm in all three sets. The values of EQ and mSSIM are 
higher and approaching towards ‘one’ for proposed fusion scheme. Thus, objective 
evaluation reveals that the proposed fusion method outperforms over the other fusion 
techniques. 
Non-subsampled Complex Wavelet Transform 553
Table 1. Objective evaluation of proposed fusion scheme and other wavelet methods. 
Study set Algorithm En FusFac EQ mSSIM 
Set 1 DWT [6] 3.0887 3.8972 0.6871 0.6387 
SWT [14] 3.1087 3.9213 0.7021 0.6377 
NSCT [10] 3.1127 4.1252 0.7256 0.6646 
DTCxWT [8] 3.1295 4.3586 0.7241 0.6574 
Proposed 3.1985 5.8546 0.7883 0.7147 
Set 2 DWT [6] 2.8476 4.3331 0.7164 0.5449 
SWT [14] 2.5687 4.9647 0.7365 0.5598 
NSCT [10] 2.9561 5.1243 0.7198 0.5836 
DTCxWT [8] 2.8814 5.6574 0.7483 0.6054 
Proposed 3.2149 6.0148 0.7928 0.6681 
Set 3 DWT [6] 3.1125 3.6550 0.8605 0.6905 
SWT [14] 3.3285 3.6805 0.8925 0.6207 
NSCT [10] 3.5593 3.9871 0.8766 0.6982 
DTCxWT [8] 3.7899 4.0153 0.8672 0.6879 
Proposed 4.1106 5.1589 0.9056 0.7354 
Three radiologists evaluated the quality of fused images subjectively. The fused 
images are compared with source images in terms of anatomical similarity, contrast, 
false content, and usefulness of fused images in delineation of infected cells or tumour. 
All the fused images are rated on the scale of 0 (poor) and 4 (excellent) by radiologists. 
The average score of subjective analysis of the fused images with various fusion algo- 
rithms is tabulated in Table 2. The average score for the proposed algorithm is ‘3.85’ 
which is higher than compared techniques. It proves that the fused images using 
proposed algorithm are useful in delineation and contouring of tumour for radiation 
therapy. Figure 5 shows fused images of three sample study sets using various wavelet 
techniques. 
Table 2. Subjective evaluation of fused images by Radiologists. 
S. N. Algorithm Subjective score by radiologists 
#1 #2 #3 Average 
1 DWT [6] 2.50 2.80 2.70 2.67 
2 SWT [14] 2.70 3.00 3.20 2.97 
3 NSCT [10] 2.90 3.10 3.30 3.10 
4 DT-CxWT [8] 3.10 3.30 3.40 3.27 
5 Proposed 3.65 3.81 4.10 3.85 
4 Conclusion and Future Scope 
The fusion scheme presented in this paper is a feature based approach in spectral domain 
using NS DT-CxWT. It provides multiscale and multiresolution representation with six 
directional selectivity, shift invariance, and phase information with reduced 
554 S. N. Talbar et al.
computational complexity. The fused images using proposed scheme are useful in better 
visualization of the abnormality or lesions for treatment planning in radiation therapy. 
Fusion rules take care of textural preservation and better representation of discontinuities 
which result in retaining actual anatomical structures in the fused images. The subjective 
score for the quality of fused images using the proposed scheme indicates the excellent 
visual quality and proves its usefulness in treatment planning. The objective parameters 
also exhibit superior fusion metrics for the proposed algorithm when compared with the 
other wavelet based fusion algorithms. The quality of fused images can be further 
improved by modifying fusion rules with the help of iterative fusion schemes like neural 
network, fuzzy logic, neuro-fuzzy, genetic algorithms, etc. 
References 
1. Kessler, M.L.: Image registration and data fusion in radiation therapy. Br. J. Radiol. 79(1), 
S99–S108 (2006) 
2. James, A.P., Dasarathy, B.V.: Medical image fusion: a survey of the state of the art. Inf. 
Fusion 19, 4–19 (2014) 
3. Pajares, G., Cruz, J.M.: A wavelet-based image fusion tutorial. Pattern Recognit. 37(9), 1855– 
1872 (2004) 
4. Qu, G.H., Zhang, D.L., Yan, P.F.: Medical image fusion by wavelet transform modulus 
maxima. Opt. Express 9(4), 184–190 (2001) 
5. Chavan, S.S., Talbar, S.N.: Multimodality image fusion in the frequency domain for radiation 
therapy. In: International Conference on Medical Imaging, m-Health and Emerging 
Communication Systems (MedCom), Noida, pp. 174–178. IEEE (2014) 
6. Yang, Y., Park, D.S., Huang, S., Rao, N.: Medical image fusion via an e?ective wavelet based 
approach. EURASIP J. Adv. Signal Process. Article ID-579341, 13 (2010) 
7. Chavan, S.S., Pawar, A, Talbar, S.N.: Multimodality medical image fusion using rotated 
wavelet transform. In: 2nd International Conference on Communication and Signal 
Processing (ICCASP - 2016). Advances in Intelligent Systems Research, vol. 137, pp. 627– 
635, Atlantic Press (2016) 
8. Singh, R., Srivastava, R., Prakash, O., Khare, A.: Multimodal medical image fusion in dual 
tree complex wavelet transform domain using maximum and average fusion rules. J. Med. 
Imaging Health Inform. 2, 168–173 (2012) 
9. Singh, R., Khare, A.: Fusion of multimodal medical images using Daubechies complex 
wavelet transform - a multiresolution approach. Inf. Fusion 19, 49–60 (2014) 
10. Chavan, S.S., Talbar, S.N.: Multimodality medical image fusion using M-band wavelet and 
Daubechies complex wavelet transform for radiation therapy. Int. J. Rough Sets Data Anal. 
2(2), 1–23 (2015) 
11. Shanmugam, G.P., Bhuvanesh, K.: Multimodal medical image fusion in non-subsampled 
contourlet transform domain. Circuits Syst. 7, 1598–1610 (2016) 
12. Chen, M.S., Lin, S.D.: Image fusion based on curvelet transform and fuzzy logic. In: 5th 
International Conference on Image and Signal Processing (CISP), pp. 1063–1067. IEEE 
(2012) 
13. Wang, L., Li, B., Tian, L.F.: Multimodal medical image fusion using the interscale and intra-scale 
dependencies between image shift-invariant shearlet coe?cients. Inf. Fusion 19, 20–28 
(2014) 
14. Das, S., Chowdhury, M., Kundu, M.K.: Medical image fusion based on ripplet transform 
type-I. Prog. Electromagn. Res. B 30, 355–370 (2011) 
Non-subsampled Complex Wavelet Transform 555
15. Singh, R., Vatsa, M., Noore, A.: Multimodal medical image fusion using redundant discrete 
wavelet transform. In: Advances in Pattern Recognition, pp. 232–235 (2009) 
16. Das, S., Kundu, M.K.: A neuro-fuzzy approach for medical image fusion. IEEE Trans. 
Biomed. Eng. 60, 3347–3353 (2013) 
17. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform. 
IEEE Signal Process. Mag. 22(6), 123–151 (2005) 
556 S. N. Talbar et al.
Predicting Concussion Symptoms Using 
Computer Simulations 
Milan Toma(B) 
Computational Bio-FSI Laboratory, College of Engineering and Computing Sciences, 
Department of Mechanical Engineering, New York Institute of Technology, 
Northern Boulevard, Old Westbury, NY 11568, USA 
tomamil@tomamil.eu 
http://www.tomamil.com 
Abstract. The reported rate of concussion is smaller than the actual 
rate. Less than half of concussion cases in high school football players 
is reported. The ultimate concern associated with unreported concus-sion 
is increased risk of cumulative e?ects from recurrent injury. This 
can, partially, be attributed to the fact that the signs and symptoms 
of a concussion can be subtle and may not show up immediately. Com-mon 
symptoms after a concussive traumatic brain injury are headache, 
amnesia and confusion. Computer simulations, based on the impact force 
magnitude, location and direction, are able to predict these symptoms 
and their severity. When patients are aware of what to expect in the 
coming days after head trauma, they are more likely to report the signs 
of concussion, which decreases the potential risks of unreported injury. 
In this work, the ?rst ever ?uid-structure interaction analysis is used to 
simulate the interaction between cerebrospinal ?uid and comprehensive 
brain model to assess the concussion symptoms when exposed to head 
trauma conditions. 
Keywords: Head injury 
·
Concussion 
·
Fluid-structure interaction 
Simulations 
1 Introduction 
In 1981, Goldsmith’s letter to the editor states, “The state of knowledge con-cerning 
trauma of the human head is so scant that the community cannot agree 
on new and improved criteria even though it is generally admitted that present 
designations are not satisfactory” [1]. Even decades later, this assessment can 
still be considered reasonable to a degree. 
The head model presented here is the only model currently incorporating 
cerebrospinal ?uid (CSF) ?ow. Other reported head models treat CSF as a solid 
part incapable of ?owing around the brain when exposed to head trauma condi-tions 
[2–6]. The CSF ?ows even on its own when the head is at rest, albeit slowly. 
Obviously, when the head is exposed to a sudden stop, e.g. in a car accident, 
a 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 557–568, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_42
558 M. Toma 
the CSF ?ow around the brain has a signi?cant contribution to the head injury 
mechanism. Without the ?ow the simulated cushioning e?ect of CSF can not be 
considered realistic. 
The most common reasons for concussion not being reported include a player 
not thinking the injury is serious enough to warrant medical attention (66.4% of 
unreported injuries), motivation not to be withheld from competition (41.0%), 
and lack of awareness of probable concussion (36.1%) [7]. Regardless of the rea-son, 
as McCrea et al. state, “Future prevention initiatives should focus on educa-tion 
to improve athlete awareness of the signs of concussion and potential risks of 
unreported injury.”, [7]. Needless to say, there is an unlimited number of trauma 
situations that can occur, and the concussion symptoms can vary from one case 
to another. 
In some cases, the skull is dented inward and it presses against the surface of 
the brain. These types of fractures occur in 11% of severe head injuries. In impact 
sports, the skull dentation rarely occurs. Most sport-related brain injuries result 
from coup-contrecoup type of injury. Coup-contrecoup injury is dual impacting 
of the brain into the skull; coup injury occurs at the point of impact; coun-trecoup 
injury occurs on the opposite side of impact, as the brain rebounds, 
see Fig. 1. Most common causes of coup-contrecoup brain injury include circum-stances 
when the head jerks violently, e.g. during motor vehicle accidents, when 
baseball players are colliding during the chase for a ball, football players tackling, 
boxers punching, and so on. 
Fig. 1. Coup-contrecoup injuries, brain shifts inside the skull resulting in injuries at 
point of impact and away from point of impact, e.g. forehead injury can result in 
additional injury to occipital area. 
The brain is composed of three main structural divisions, namely the cere-brum, 
cerebellum, and brainstem. The cerebrum is divided into two cerebral 
hemispheres connected by the corpus callosum and shared ventricular system. 
The CSF ?lls a system of cavities at the center of the brain, known as ventricles,
Simulating Concussion Symptoms 559 
and the subarachnoid space surrounding the brain and spinal cord (Fig. 2). The 
CSF cushions the brain within the skull and serves as a shock absorber for the 
central nervous system [8,9]. 
Fig. 2. The schematic of the cerebrospinal ?uid in which the brain is submerged. The 
3D computational model used is designed based on this schematic. 
2 Methods 
The methods section describes the creation of the head model, loading conditions 
used for its validation, and numerical and computational methods used. 
A. Head Model 
The ?ve anatomical structures used in this study are shown in Fig. 3. They 
all have unique material properties. This patient-speci?c model is based on the 
Digital Imaging and Communications in Medicine (DICOM) images acquired 
from an online database. The skin, spinal cord, meninges, and the arachnoid 
granulation, are the anatomical features missing in this model. When compared 
to the very short impact impulse time history used in these simulations, the CSF 
?ow in the head can be neglected, too. The CSF ?ow speed, 0.05–0.08 
m·s-1 , is 
relatively slow compared to the speed of an impact leading to traumatic brain 
injuries, i.e. during the impact impulse time history the CSF ?ows by 0.2–0.3 mm. 
Based on these assumptions, the presence of the granulations can be neglected, 
too. 
B. Loading Conditions 
Based on whether the head is stationary and struck by a moving object, or is 
moving and strikes a stationary object, the type of brain injury di?ers, according 
to [10]. The stationary head is usually hit by objects which are of similar mass to 
the head. In this study, the scenario in Fig. 1 is used and it is assumed that the 
impacting object does not penetrate the skull. Thus, local deformation of the
560 M. Toma 
Fig. 3. The entire head model with skull, cerebrum, cerebellum, pituitary gland and 
brainstem, respectively. Fluid particles (blue dots surrounding the brain model, in the 
lower right corner) ?ll the entire subarachnoid space and other cavities. 
skull in the frontal area is not resulting in direct contact injury to the underlying 
brain tissue. It has been estimated that for a contact area of approximately 
6.5 cm2 the force required to produce a clinically signi?cant skull fracture in the 
frontal area of the cadaver skull is twice that required in the temporoparietal 
area [11]. 
Corresponding loading conditions from cadaveric experiments in [12] are used 
to perform the computational analysis of a frontal impact. The experiments 
examined the blow to the head of a seated human cadaver. The impact pulse 
history applied to the skull of the computational model is shown in Fig. 4.
Simulating Concussion Symptoms 561 
Fig. 4. Impact impulse time history used to simulate the cadaveric experiments in [12] 
and applied to the skull in the current model. 
C. Computer Simulations 
As stated above, the model is comprised of ?ve parts. Rigid material properties 
with density 1900 kg·m-3 [13] are assigned to the skull part. A non-linear elas-tic 
constitutive material model with varying material properties based from the 
literature [14–18] is used to simulate the cerebrum, cerebellum, pituitary gland, 
and brainstem. The cerebrum is composed of 96,385 tetrahedral elements. Sim-ilarly, 
the cerebellum, brainstem, and pituitary gland are composed of 40,808, 
18,634 and 310 tetrahedral elements, respectively. The smoothed-particle hydro-dynamics 
(SPH) method is used to model the CSF. The bulk modulus of 21.9 
GPa [3] and density 1000 kg·m-3 [19] are used for the CSF. The subarachnoid 
space between the skull and brain, and other cavities, are ?lled with 94,690 ?uid 
particles. 
The IMPETUS Afea SPH 
Solver 
(IMPETUS Afea AS, Norway) was used 
R 
to solve the ?uid motion and boundary interaction calculations. Simultaneously, 
the IMPETUS Afea 
Solver 
was used to solve the large deformations calcula-
R 
tions in the solid parts. In both the solvers, for parallel processing, a commodity 
GPU was used. To remove the possibility of hourglass modes and element inver-sion 
that plagues the classic under-integrated elements, all solid elements were 
fully integrated. An explicit integration scheme was used for both the ?uid and 
solid domains and their interaction. A standard “under the table” workstation 
was used for all simulations. Tesla K40 GPU with 12 GB of Graphic DDR 
memory and 2880 CUDA Cores were used to achieve the parallel acceleration. 
H-re?nement of the ?nite element mesh was performed to con?rm that conver-gence 
was reached. The solutions were found to yield same results with both 
the mesh size of our choice and mesh size of higher number of elements. Simi-larly, 
a higher number of ?uid particles is used to obtain results within 5% of 
the values obtained with the smaller number of particles. This con?rmed that 
the results are converged. The SPH equations in greater detail can be found 
in our prior publication [20]. This study used the SPH method rather than the
562 M. Toma 
traditional FSI techniques because the latter can be computationally expensive 
and challenging regarding their parallelization [21]. Geometrical simpli?cations 
would need to be used in order to use traditional FSI methods. Consequently, the 
anatomical accuracy of the model would have to be sacri?ced. Besides, recently 
the SPH has been increasingly used in biomedical applications by other research 
groups as well [22]. 
3 Results 
The results section shows validation of the simulations matching coup and con-trecoup 
responses in CSF with experimental results. The stress values on the 
cerebrum resulting from the frontal impact are shown and SPH impulse inten-sity 
is superimposed with the Boadmann’s map of cytoarchitectonics. 
A. Validation 
The loading conditions from cadaveric experiments (Fig. 4) applied to the frontal 
lobe yield corresponding coup and contrecoup pressure responses in CSF, see 
Fig. 5 where both experimental [12] and computational results are shown for 
comparison. 
B. Second Deviatoric Principal Stress 
The stress values on the cerebrum resulting from the frontal impact are shown in 
Fig. 6. The stress maxima can be found also on the occipital lobe which supports 
the experimental observations that forehead injury can result in additional injury 
to occipital area. Similar conclusion, i.e. stresses and strains seen in both frontal 
and occipital lobes, is also found in other more simpli?ed computational studies, 
e.g. [5]. 
Similar results, i.e. high stress values, are found also on the parietal lobe 
(Fig. 7). Moreover, here it is possible to make an additional observation that 
they only occur on the posterior aspects of the gyri. 
C. SPH impulse intensity 
In biomedical ?uid mechanics, the wall shear stress is often used to describe the 
e?ect the ?uid ?ow has on the surrounding structure. However, that variable 
is challenging to derive when using SPH methods. Instead, SPH can provide 
di?erent variable with similar meaning. For example, SPH impulse intensity, 
i.e. SPH driven mechanical impulse per unit area in pascal-second, has similar 
properties as wall shear stress. 
The SPH impulse intensity at peak impact impulse is shown in Fig. 8 [25]. At 
?rst, the SPH impulse intensity develops slowly. And, eventually, it reaches its 
maximum values around the peak. The areas most a?ected by the ?uid particles 
during their migration to the occipital/parietal bones, i.e. the acceleration phase, 
are the parietal and upper temporal lobes. The higher SPH impulse intensity 
values become more visible also in the occipital lobe when the ?uid particles 
change direction and start their migration towards the frontal bone, i.e. at the 
peak.
Simulating Concussion Symptoms 563 
Fig. 5. Coup (a) and contrecoup (b) pressure responses in cerebrospinal ?uid compared 
to the experimental results of Nahum et al. [12]. 
Fig. 6. High values of the second deviatoric principal stress are observed in both the 
frontal and occipital lobes of the brain, i.e. forehead injury can result in additional 
injury to occipital area. High values are prevalent mostly in the inner areas of the two 
hemispheres close to the edges where longitudinal ?ssure separates the two halves of 
the brain (dashed rectangle). 
Cerebral structures have been correlated with speci?c functions [23,24]. 
While the structure-function relationship is still debated, Brodmann’s map is 
frequently cited [23]. Figure 8 imposes Brodmann’s map of cytoarchitectonics 
and depicts the functional areas most a?ected at the peak. Areas ‘40’, ‘4’, ‘3,1,2’ 
and ‘52’ are those covered with more than 10% of SPH impulse intensity maxima 
(10.1, 11.7, 15.3 and 21.7%, respectively).
564 M. Toma 
Fig. 7. High values of the second deviatoric principal stress are observed also in the 
parietal lobe. However, in the parietal lobe the areas with high values are observed 
only in the posterior aspects of the gyri (schematic and dashed ellipsoid). 
Fig. 8. The SPH impulse intensity at the peak superimposed with the Brodmann’s 
map of cytoarchitectonics [25]. 
4 Discussion 
The di?erent layers of the brain move at di?erent times because each layer has 
a di?erent density. Simpli?ed computational models are not able to incorporate 
this important aspect. Moreover, interaction between CSF and brain gyri and 
sulci can not be analyzed computationally if the methods used do not model the 
CSF as ?uid. The model used in this study uses a comprehensive head/brain 
model with detailed representation of all the parts and the computational anal-ysis 
used is an FSI method with ?uid properties for the CSF. The validation of 
this model and the computational method is shown comparing the coup and con-trecoup 
pressure responses in CSF with the experimental results from cadaveric 
experiments.
Simulating Concussion Symptoms 565 
A few anatomical features are omitted in the head model; namely the skin, 
arachnoid granulations, spinal cord, vasculature, and meninges. Obviously, skin 
is irrelevant in this case. Due to the relatively slow CSF ?ow, the arachnoid gran-ulations 
are negligible. The spinal cord, vasculature, and meninges are omitted 
at this stage to make the simulations less computationally expensive, but they 
may be considered in future studies. 
In Fig. 5, where coup and contrecoup pressure responses in CSF compared 
to the experimental results of [12] are shown, it can be observed that the agree-ment 
with the experimental results is better in the coup response as opposed 
to that in the contrecoup response. The contrecoup pressure response reaches 
slightly higher values compared to the experimental data because the contrecoup 
response is secondary and therefore more dependent on the patient-speci?c geom-etry 
used. However, both coup and contrecoup computational pressure responses 
can be considered of good agreement with the experimental measurements. 
As discussed, if the interaction of CSF with the brain is to be analyzed the 
CSF has to be modeled with ?uid elements or particles and not just with ?uid-like 
solid elements. The results then have potential to show more complex responses 
to the loading conditions. For example, Fig. 6 shows that the contrecoup stress 
response is prevalent mostly in the inner areas of the two hemispheres close to 
the edges where longitudinal ?ssure separates the two halves of the brain. The 
brain model is comprehensive containing multiple parts each with detailed real-istic 
patient-speci?c geometry. The complexity of the model enables the analysis 
of the brain down to the exact gyrus and sulcus. Additional areas of high stress 
values can be found outside the frontal and occipital lobes. However, interest-ingly, 
only the posterior aspect of the gyrus seems to be a?ected. This can be 
explained by following the wave in the CSF that occurs after the impact to 
the frontal lobe [25]. During the acceleration phase when the brain wants to 
move backwards relative to the skull the ?uid particles move to concentrate in 
the space between the skull and occipital lobe to provide the cushioning e?ect 
and prevent the brain from impacting to the skull. At that point the moving 
particles a?ect mostly the anterior sides of the gyri. When the brain rebounds 
and wants to move forward relative to the skull the ?uid particles move to the 
space between the skull and frontal lobe to provide the cushioning e?ect there. 
At that point the moving particles a?ect mostly the posterior side of the gyri. 
Other parts of the brain, such as the brain stem, are equally a?ected by the 
coup-contrecoup injury. 
The variables readily available in the SPH methods are somewhat di?erent 
from those commonly used to post-process the results in the biomedical ?uid 
mechanics, e.g. wall shear stress extracting of which would be more challenging 
when using the SPH methods. On the other hand, e.g. SPH impulse intensity 
can be used in its stead as it o?ers similar meaning. In order to maintain as 
much anatomical accuracy as possible, SPH is used in this study instead of the 
traditional FSI techniques which would require more anatomical simpli?cations 
to keep the convergence criteria satis?ed.
566 M. Toma 
The cortical areas a?ected by SPH impulse intensity at the peak are pre-sented 
in Fig. 8 [25,26]. It is o?ered that the patterns of SPH impulse intensity 
maxima may represent the cortical areas most a?ected by a concussion. Areas 
‘40’, ‘4’, ‘3,1,2’, and ‘52’ are the Brodmann’s areas with at least 10% coverage 
of maximal SPH impulse intensity. The left supramarginal gyrus, i.e. Brodmann 
area ‘40’, receives input from multiple sensory modalities and supports complex 
linguistic processes. Lesions in that area may yield Gerstmann syndrome and 
?uent aphasia, such as Wernicke’s aphasia. Motor functions are typically asso-ciated 
with Brodmann area ‘4’, but it also plays a supportive role in sensory 
perception. Lesions there may result in paralysis and decreased somatic sensa-tion. 
Brodmann areas ‘3,1,2’ comprise the postcentral gyrus in the parietal lobe 
and are primarily associated with somatosensory perception. Lesions there may 
result in cortical sensory impairments, e.g. loss of ?ne touch and proprioception. 
Brodmann area ‘52’, i.e. the parainsular, is the smallest of the mentioned areas 
and has the highest percentage of SPH impulse intensity maxima coverage. It 
joins the insula and the temporal lobe. 
This validated model, where an FSI method is used to analyze the interac-tion 
between CSF and brain, is a step closer to understanding the mechanisms of 
brain injuries. Concussions are usually diagnosed symptomatically. Patients may 
exhibit a range of symptoms, such as headache, tinnitus, photophobia, sleepi-ness, 
dizziness, behavioral changes and confusion. Di?erent area of brain a?ected 
would potentially result in di?erent set of symptoms. The model and method 
presented in this study can predict the areas a?ected based on the loading con-ditions. 
Therefore, the symptoms can be predicted, too. Since the signs and 
symptoms of a concussion can be subtle and may not show up immediately, a 
numerical analysis of this kind could serve as a predictor for the physicians and 
patients who then could be warned about what symptoms they are to expect 
and be ready for. Hence, if used in practice, it has the potential to contribute to 
early diagnosis which is important in treatment of concussion. 
References 
1. Goldsmith, W.: Current controversies in the stipulation of head injury criteria - 
letter to the editor. J. Biomech. 14(12), 883–884 (1981) 
2. Luo, Y., Li, Z., Chen, H.: Finite-element study of cerebrospinal ?uid in mitigating 
closed head injuries. J. Eng. Med. 226(7), 499–509 (2012) 
3. Cha?, M.S., Dirisala, V., Karami, G., Ziejewski, M.: A ?nite element method 
parametric study of the dynamic response of the human brain with di?erent 
cerebrospinal ?uid constitutive properties. In: Proceedings of the Institution of 
Mechanical Engineers, Part H (2009). Journal of Engineering in Medicine 223(8), 
1003–1019 
4. Liang, Z., Luo, Y.: A QCT-based nonsegmentation ?nite element head model for 
studying traumatic brain injury. Appl. Bionics Biomech. 2015, 1–8 (2015) 
5. Gilchrist, M.D., O’Donoghue, D.: Simulation of the development of the frontal 
head impact injury. J. Comp. Mech. 26, 229–235 (2000)
Simulating Concussion Symptoms 567 
6. Ghajari, M., Hellyer, P.J., Sharp, D.J.: Computational modelling of traumatic 
brain injury predicts the location of chronic traumatic encephalopathy pathology. 
Brain 140(2), 333–343 (2017) 
7. McCrea, M., Hammeke, T., Olsen, G., Leo, P., Guskiewicz, K.: Unreported con-cussion 
in high school football players: implications for prevention. Clin. J. Sport 
Med. 14(1), 13–17 (2004) 
8. Rengachary, S.S., Ellenbogen, R.G.: Principles of Neurosurgery. Elsevier Mosby, 
New York (2005) 
9. Toma, M., Nguyen, P.: Fluid-structure interaction analysis of cerebral spinal ?uid 
with a comprehensive head model subject to a car crash-related whiplash. In: 5th 
International Conference on Computational and Mathematical Biomedical Engi-neering 
- CMBE2017. University of Pittsburgh, Pittsburgh (2017) 
10. Yanagida, Y., Fujiwara, S., Mizoi, Y.: Di?erences in the intracranial pressure 
caused by a blow and/or a fall - experimental study using physical models of 
the head and neck. Forensic Sci. Int. 41, 135–145 (1989) 
11. Nahum, A.M., Gatts, J.D., Gadd, C.W., Danforth, J.: Impact tolerance of the 
skull and face. In: 12th Stapp Car Crash Conference, Warrendale, PA, pp. 302– 
316. Society of Automotive Engineers (1968) 
12. Nahum, A.M., Smith, R.W., Ward, C.C.: Intracranial pressure dynamics during 
head impact. In: 21st Stapp Car Crash Conference (1977) 
13. Fry, F.J., Barger, J.E.: Acoustical properties of the human skull. J. Acoust. Soc. 
Am. 63(5), 1576–1590 (1978) 
14. Barser, T.W., Brockway, J.A., Higgins, L.S.: The density of tissues in and about 
the head. Acta Neurol. Scandinav. 46, 85–92 (1970) 
15. Elkin, B.S., Azeloglu, E.U., Costa, K.D., Morrison, B.: Mechanical heterogene-ity 
of the rat hippicampus measured by atomic force microscope indentation. J. 
Neurotrauma 24, 812–822 (2007) 
16. Gefen, A., Gefen, N., Zhu, Q., Raghupathi, R., Margulies, S.S.: Age-dependent 
changes in material properties of the brain and braincase of the rat. J. Neurotrauma 
20, 1163–1177 (2003) 
17. Kruse, S.A., Rose, G.H., Glaser, K.J., Manduca, A., Felmlee, J.P., Jack Jr., C.R., 
Ehman, R.L.: Magnetic resonance elastography of the brain. Neuroimage 39, 231– 
237 (2008) 
18. Moore, S.W., Sheetz, M.P.: Biophysics of substrate interaction: in?uence on neutral 
motility, di?erentiation, and repair. Dev. Neurobiol. 71, 1090–1101 (2011) 
19. Lui, A.C., Polis, T.Z., Cicutti, N.J.: Densities of cerebrospinal ?uid and spinal 
anaesthetic solutions in surgical patients at body temperature. Can. J. Anaesth. 
45(4), 297–303 (1998) 
20. Toma, M., Einstein, D.R., Bloodworth, C.H., Cochran, R.P., Yoganathan, A.P., 
Kunzelman, K.S.: Fluid-structure interaction and structural analyses using a com-prehensive 
mitral valve model with 3D chordal structure. Int. J. Numer. Meth. 
Biomed. Engng. 33(4), e2815 (2017). https://doi.org/10.1002/cnm.2815 
21. Toma, M., Oshima, M., Takagi, S.: Decomposition and parallelization of strongly 
coupled ?uid-structure interaction linear subsystems based on the Q1/P0 
discretization. Comput. Struct. 173, 84–94 (2016). https://doi.org/10.1016/j. 
compstruc.2016.06.001 
22. Toma, M.: The emerging use of SPH in biomedical applications. Signi?cances Bio-eng. 
Biosci. 1(1), 1–4 (2017). SBB.000502 
23. Brodmann, K.: Vergleichende Lokalisationslehre der Grosshirnrinde (in German). 
Johann Ambrosius Barth, Leipzig (1909)
568 M. Toma 
24. Limited TCT Research (ed.) Cortical Functions. Trans Cranial Technologies ltd. 
(2012) 
25. Toma, M., Nguyen, P.: Fluid-structure interaction analysis of cerebrospinal ?uid 
with a comprehensive head model subject to a rapid acceleration and deceleration. 
Brain Inj. 1–9 (2018). https://doi.org/10.1080/02699052.2018.1502470 
26. Varlotta, C., Toma, M., Neidecker, J.: Ringside physicians’ medical manual for 
boxing and mixed martial arts: technology & impact sensor testing. Association of 
Ringside Physicians, Chapter D10 (2018)
Integrating Markov Model, Bivariate Gaussian 
Distribution and GPU Based Parallelization 
for Accurate Real-Time Diagnosis 
of Arrhythmia Subclasses 
Purva R. Gawde1(&) , Arvind K. Bansal1 , and Jeffery A. Nielson2 
1 
Department of Computer Science, 
Kent State University, Kent, OH 44240, USA 
pgawde@kent.edu, arvind@cs.kent.edu 
2 
Department of Emergency, Northeast Ohio Medical University, 
Rootstown, OH, USA 
jeffnielson@gmail.com 
Abstract. In this paper, we present the integration of SIMT (Single Instruction 
Multiple Threads), Markov model and bivariate Gaussian distribution as a 
general-purpose technique for real-time accurate diagnosis of subclasses of 
arrhythmia. The model improves the accuracy by integrating both morpholog-ical 
and temporal features of ECG. GPU based implementation exploits con-current 
execution of multiple threads at the heart-beat level to improve the 
execution ef?ciency. The approach builds a bivariate Gaussian Markov model 
(BGMM) for each subclass of arrhythmia where each state includes bivariate 
distribution of temporal and morphological features of each waveform and ISO-lines 
using ECG records for each subclass from standard databases, and the 
edge-weights represent the transition probabilities between states. Limited 30- 
second subsequences of a patient’s beats are used to develop bivariate Gaussian 
transition graphs (BGTG). BGTGs are matched with each of the BGMMs to 
derive the exact classi?cation of BGTGs. Our approach exploits data-parallelism 
at the beat level for ECG preprocessing, building BGTGs and matching multiple 
BGTG-BGMM pairs. SIMT (Single Instruction Multiple Thread) available on 
CUDA resources in GPU has been utilized to exploit data-parallelism. Algo-rithms 
have been presented. The system has been implemented on a machine 
with NVIDIA CUDA based GPU. Test results on standard MIT- BIH database 
show that GPU based SIMT improves execution time further by 78% with an 
overall speedup of 4.5 while retaining the accuracy achieved by the sequential 
execution of the approach around 98%. 
Keywords: ArrhythmiaAI techniquesECG analysisGaussian 
GPUMarkov modelMedical diagnosisMachine learning 
ParallelismWearable devices 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 569–588, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_43
1 Introduction 
An aging population is challenging the current healthcare system by increasing costs, 
creating a lack of healthcare personnel, and contributing to more complex combinations 
of chronic diseases [1]. Cardiovascular diseases like arrhythmia, ischemia, myocardial 
infarction and cardiomyopathy (including hypertrophy) are some of the most common 
problems in elderly leading to sudden cardiac death (SCD) [1, 2] and congestive heart 
failure. Often, these symptoms go undetected due to the transient nature of symptoms 
and the mobile life-style of the modern society. Transitory nature of arrhythmia 
requires monitoring of ECG to diagnose and reduce the risk of SCD [1] including life-threatening 
ventricular ?brillation [3]. 
The demand for an improved healthcare system requires development of infor-mation 
technology, and one area of opportunity is wearable smart monitoring devices 
[4, 5]. Advances in microelectronics have provided smaller, faster and more affordable 
embedded platforms for personal monitoring systems such as the NVIDIA Jetson GPU 
[4, 5]. Most of these wearable biomedical systems can detect a variety of abnormalities 
such as stress, oxygen level saturation, ischemia and arrhythmias, but with limited 
accuracy. 
ECG signal analysis for real-time detection of abnormalities involves computation-ally 
expensive modules like signal denoising, morphological and temporal feature 
extractions; complex functional transforms [6], computational intelligence techniques for 
classi?cation and machine learning. The AI techniques include the use of Bayesian 
network [7], neural networks [8] and Markov models [9, 10]. The computational overhead 
of exploiting these techniques is signi?cant and violates the basic requirement of 
resource-limited smart wearable devices diagnosing abnormality accurately in real-time. 
In recent years, several researchers have exploited GPU based SIMT (Single 
Instruction Multiple Threads) parallelism to improve the computational ef?ciency for 
automated ECG analysis [11], de-noising [12], and classi?cation of premature beats 
using neural networks [8, 13]. Different techniques for parallelization include time-domain 
analysis [7] and probabilistic neural networks [13]. For arrhythmic beat clas-si?cation, 
Fan, Xiaomao, et al. [14] have proposed GPU based detection of seven types 
of beats using thresholds and rule-based system. These studies indicate that GPU based 
parallelization signi?cantly improves the computational ef?ciency of ECG analysis. 
However, these studies separate only premature ventricular complex beats from normal 
beats [13], and do not address the diagnosis of the subclassi?cation of ventricular and 
supraventricular arrhythmia in real time. 
The ?ner classi?cation of arrhythmias is important because different subclasses 
require different treatment [2]. For instance, ventricular tachycardia is generally treated 
with antiarrhythmic drugs [2]; while ventricular ?brillation needs immediate treatment 
by a de?brillator. Subclasses of supraventricular arrhythmia like the atrial flutter can 
result into blood clots leading to cerebrovascular events [3] if not treated. 
Finer subclassi?cation requires an integrated model that can capture both mor-phological 
and temporal characteristics of ECG and consider transition probabilities 
within waveforms to account for waveform variations. Arrhythmic ECG also presents a 
570 P. R. Gawde et al.
challenge when some waveform features are embedded in another waveform [3], which 
can lead to misclassi?cation [1, 3]. 
Our earlier work focused on detecting ?ner subclasses of supraventricular and 
ventricular arrhythmia in real time using the integration of Markov models and the 
identi?cation of embedded P-waves [9, 10]. The run-time detection of the disease 
subclass requires: (1) statistical derivation of a theoretical Markov model graph for 
each subclass; (2) dynamically building a real-time graph using a limited number of 
beats at the run-time; and (3) matching the real-time graph from an individual patient to 
the derived graphs to best classify the patient condition. 
Our previous work needs to be further improved for time-ef?ciency because 
resource-limited wearable devices need to analyze the ECG for other heart abnor-malities 
such as ischemia (lack of oxygen), electrolyte imbalance such as hyperkalemia 
(excessive potassium), myocardial infarction (heart failure due to prolonged ischemia) 
to name a few. Additional improvement in execution-time is required to facilitate real-time 
detection of other heart abnormalities concurrently and in real-time in resource-limited 
miniaturized wearable devices [4, 5]. 
In this research, we propose an integrated general-purpose BGMM (Bivariate 
Gaussian Markov Model) model that further improves the accuracy by associating 
bivariate Gaussian distribution of amplitude and duration of the waveforms and ISO-lines 
with each state of the Markov model. We improve execution ef?ciency by 
exploiting SIMT parallelism available on GPU as shown in Fig. 1. 
The major contributions in this paper are: 
1. The development of a general-purpose model that integrates bivariate Gaussian 
distribution of amplitude and duration of waveforms for a state with Markov model 
to integrate morphological and temporal features. 
2. The exploitation of SIMT based concurrency on GPUs that signi?cantly improves 
the execution ef?ciency of the ?ner subclassi?cation of arrhythmia. 
3. The development of multiple algorithms for beat-level exploitation of SIMT for 
dynamic graph building and graph matching exploiting expectation maximization 
for the arrhythmia subclassi?cation. 
The remainder of the paper is organized as follows: Sect. 2 describes the back-ground 
concepts of Markov model and bivariate Gaussian distribution. Section 3 
describes our BGMM based approach for arrhythmia subclassi?cation. Section 4 dis-cusses 
SIMT parallelization of the approach. Section 5 discusses algorithms for 
Abnormality detection 1 
Multiprocessor embedded GPU 
CPU 
Sensor Data 
Preprocessing 
SIMT concurrency 
Abnormality detection n 
Executable function 1 
Executable function n 
Fig. 1. Personal monitoring system for multiple abnormalities detection. 
Integrating Markov Model, Bivariate Gaussian Distribution 571
execution of kernel functions; Sect. 6 discusses implementation and performance 
results. Section 7 compares our approach and performance with other related works. 
Section 8 concludes the paper and discusses future directions. 
2 Background 
2.1 Arrhythmia Subclassi?cation 
Arrhythmia is de?ned as irregular heartbeats caused by the presence of irregular and 
refractory pulse patterns due to the presence of ectopic nodes arising outside the sinus 
node. Arrhythmia is broadly classi?ed into either supraventricular arrhythmias arising 
above lower chambers of heart, or ventricular arrhythmia arising in the lower chambers 
of heart. Supraventricular arrhythmias are further subclassi?ed as: (1) Atrial ?brillation 
(AFib); (2) Atrial flutter (AF); (3) Atrial-ventricular nodal reentry tachycardia 
(AVNRT); and (4) Ectopic atrial tachycardia (EAT). Ventricular arrhythmia is clas-si?ed 
into three major subclasses: (1) Ventricular Tachycardia (VTach), (2) Ventricular 
Flutter (VFlu) and (3) Ventricular Fibrillation (VFib). Different subclasses have dif-ferent 
levels of threat to health, and are treated differently [3]. 
Atrial ?brillation (AFib) is characterized by the absence of P-waves and a QRS 
complex duration of less than 120 ms with an atrial rate of 400–600 beats per minute 
(bpm). Atrial Flutter (AF) is characterized by the presence of P-waves with shorter 
duration, elevated PQ baseline and 250–350 atrial bpm. Atrial-ventricular nodal 
reentry tachycardia (AVNRT) is characterized by retrograde P-waves after or embedded 
inside QRS-complex, with an atrial rate of 250–300 bpm. Ectopic atrial tachycardia 
(EAT) is characterized by the negative P-waves, T-wave elevation and heart rate of 
around 150 bpm. 
Ventricular Tachycardia (VTach) is typically characterized by wide S-wave 
(>100 ms), elevated R-wave, wide T-wave and heart rate greater than 100 bpm. 
Ventricular Flutter (VFlu) is characterized by the absence of P-waves, T-waves, S-waves, 
baselines, wide R-waves, elevated amplitude of R-waves and increased QT 
duration with heart rate 180–250 bpm. Ventricular Fibrillation (VFib) is characterized 
by no identi?able P-wave, T-wave or ISO lines, elevated ST baselines and heart rate of 
150–500 bpm. 
Arrhythmia 
Supraventricular 
AFIB 
AF 
AVNRT 
EAT 
VTach 
VFlu 
VFib 
Ventricular 
Fig. 2. A subclassi?cation of Arrhythmia. 
572 P. R. Gawde et al.
2.2 Markov Model 
A Markov model [15] is a probabilistic ?nite-state nondeterministic automaton mod-eled 
by a 5-tuple of the form (set of all states, set of initial states, set of ?nal states, 
transition matrix, and initial-state-probability-vector). Weighted edges are the transition 
probabilities between two adjacent states. Statistical analysis based upon transition 
frequency is used to build Markov models. 
2.3 Bivariate Gaussian Distribution 
The joint distributions of two variables, denoted as A and B, having normal Gaussian 
distribution [16, 17] is calculated using conditional variance. Conditional variance is 
used based on correlation between variables [17]. Assuming lA and rA represent mean 
and variance of the variable A, and lB and rB represent the mean and variance of B. 
Conditional mean for the variable B is calculated by (1). 
EðBjAÞ ¼ 
lB 
þ q 
rB 
rA 
ðA 
A lAÞ ð1Þ 
Where, q represents the covariance between the variables A and B. Conditional 
variance of B is calculated by (2). 
r2 BjA 
¼ 
r2 B 
1 
2 
q 
2 
2 
2 
ð2Þ 
Conditional distribution of the variable B given A = a is calculated by (3). 
hðbjaÞ ¼ 
1 
rBjA 
??????n      
2p 
p 
exp 
x 
B 
x 
lBjA 
x 
2 
2r2 BjA 
2 
6 
4 
3 
7 
5 
ð3Þ 
Using conditional distribution of B, the joint probability distribution is calculated 
by (4). 
f ða; bÞ ¼ 
fAðaÞ:hðbjaÞ ð4Þ 
2.4 Statistical Modeling of ECG for Subclassi?cation 
Bivariate Gaussian Markov Model (BGMM). A bivariate Gaussian Markov model 
(BGMM) is a special class of Markov models that integrates joint Gaussian distribution 
[17] of feature vectors for the states and probabilistic transition between the states. It is 
modeled as a weighted directed graph where transition probabilities between two 
adjacent states represent the weight of the edges and state value of a graph represent the 
joint Gaussian distribution of two variables: amplitude and duration. 
Integrating Markov Model, Bivariate Gaussian Distribution 573
BGMM has eight states and transitions. The eight states are: (1) P-wave features; 
(2) Q-wave features; (3) R-wave features; (4) S-wave features; (5) T-wave feature; 
(6) PQ iso-segment; (7) ST iso-segment; and (8) TP iso-segment. 
Bivariate Gaussian Transition Graph (BGTG). A bivariate Gaussian transition 
graph (BGTG) is a weighted directed graph that shows the probability of transition 
between the adjacent states of a ?nite state automaton like BGMM. However, BGTG is 
made from a small sample of data-elements from the same patient’s heart-beats in 
comparison to the BGMM that carries large sample-size of multiple patients having a 
common physician annotated abnormality. The matching of BGTG with BGMM graph 
provides subclassi?cation of a patient’s ECG. 
2.5 ECG Signal Preprocessing 
Denoising. Raw ECG signals from the MIT-BIH database [18] contain at least three 
types of noise: electromyography noise from muscles’ movement, radio frequencies 
and power line noise [6]. Discrete Wavelet Transforms (DWT), a multi-resolution 
decomposition scheme is used to eliminate these noises [6]. The source signal is 
decomposed into low and high frequency sub-bands. Low-pass and high-pass ?lters are 
used to remove low-frequency and high-frequency sub-bands, respectively. 
Feature Extraction. Amplitude and duration for waveforms (P, Q, R, S and T) and for 
baselines (TP or ISO1, PQ or ISO2 and ST or ISO3): Daubechies 6 (D6) wavelet 
transform is used to detect amplitude and duration of waveforms in each beat [6]. 
Wavelet transforms are scaled up to eight levels to obtain corresponding approximation 
coef?cients. Four separate algorithms [6] are used to detect R-wave, Q and S-wave, PQ 
and ST segments, and P-waves. Based on zero crossings of waveforms, durations of the 
waveforms and baselines are derived. 
SIMT and Parallel Computations. SIMT (Single Instruction Multiple Threads) 
paradigm is based upon executing the same sequence of instructions concurrently 
spawning multiple light-weight threads. 
2.6 GPU and CUDA Architecture 
A CUDA based GPU has multiprocessor cores, and acts as a coprocessor to the main 
CPU. CUDA (Compute Uni?ed Device Architecture) supports data-parallelism using 
SIMT paradigm by spawning a high number of concurrent threads on different sets of 
data-elements in compute-intensive applications [19]. Streaming multiprocessors 
(SM) are assigned to multiple groups of threads called blocks using a grid architecture 
[19] as shown in Fig. 3. 
Each SM has multiple CUDA cores that are comprised of ALUs, FPUs (Floating 
Processing Unit), load/store units and registers. These cores are assigned automatically 
to balance the load by the SM scheduler. The GPU supports high latency global 
memory to share information between CPU and GPU, short latency constant memory 
that cannot be altered during a thread’s execution, limited on-chip shared memory and 
local memory. Global memory is also used to share information across SMs. Constant 
574 P. R. Gawde et al.
memory is a cache memory written into before spawning the corresponding thread. It 
does not allow rewriting during the thread execution. 
A block is a group of threads that can be executed concurrently. These threads 
communicate to each-other using low latency shared memory. The threads are auto-matically 
allocated CUDA cores to exploit concurrency and balance the load. 
NVIDIA GPU Based Architecture. NVIDIA GPU exploits data parallelism by 
concurrent spawning of multiple threads. These threads are automatically allocated 
CUDA cores, over which a programmer has no control. Distribution of data on SMs for 
exploiting concurrency is also automated, and this cannot be speci?ed by the pro-grammer, 
either. The spawning of multiple blocks enhances the chance of concurrent 
utilization of multiple SMs by mapping different blocks on different SMs. 
3 BGMM Based Classi?cation of Arrhythmia 
Each state of the BGMM is associated with joint distribution of two variables: 
amplitude and duration. Transitions between the states represent transition probabilities 
between the states. 
Values of zero vary in meaning for amplitude and duration: The duration of zero for 
any of the baseline segments: ISO1 (TP-segment), ISO2 (PQ-segment) and ISO3 (ST 
segment) imply that the corresponding state in the BGMM is bypassed (i.e. the event 
never occurred). Conversely, an amplitude-value of zero is anticipated, and does not 
imply the absence of transitions between the ISO-states and the corresponding P-Q-R-
S-T states because ISO-states have no peak (i.e. zero amplitude) in regular heart-beats. 
P-waves embedded in the QRS-complex are considered missing. 
The overall approach for real time irregular beat subclassi?cation is divided into 
two phases (as shown in Fig. 4): (1) a training phase that uses the standard MIT-BIH 
database [18], and (2) a dynamic diagnosis phase based upon real-time collection and 
analysis of a sequence of multiple beats-windows. 
Training Phase: A BGMM is constructed for each subclass using the annotated MIT-
BIH database [18]. The training phase has four stages: (1) denoising the beats; 
(2) feature extraction (amplitude and duration of each waveform in a beat); (3) area 
subtraction to identify embedded waveforms; and (4) construction of Markov model. 
Dynamic Detection Phase: This phase has six stages: (1) de-noising of acquired beats 
(2) heartbeat collection for 30 s window; (3) morphological and temporal feature’s 
… 
… 
Grid 1 
Block (1,1) 
Kernel 
CPU 
GPU 
Block(0,0) Block (1,0) 
Block(0,1) 
Block (0,0) 
Thread 1 Thread n 
Fig. 3. A CUDA architecture. 
Integrating Markov Model, Bivariate Gaussian Distribution 575
extraction; (4) embedded P-wave and R-wave detection, (5) BGTG construction and 6) 
BGTG classi?cation. 
The second stage is executed once for ?rst window of signal; subsequent windows 
do not require this stage because they incrementally build the statistical information by 
adding next beat information and removing the least recent beat information. A win-dow 
of 30 s is chosen for beat analysis to balance the quick response time needed in 
emergency conditions and to maintain accuracy. 
Each GPU analyzes around 20 beats based on optimal error analysis [17] using a 
con?dence interval of 95%. Statistical analysis showed that error increases by 2% for 
10 beats, and decreases only by 0.2% for 40 beats. However, performance degrades for 
40 beats window. 
3.1 Embedded Waveforms Detection 
Embedded waveform analysis is required to derive one waveform embedded in 
another. This can occur in the same beat or a preceding beat. An embedded waveform 
can often be mistakenly considered missing [3] leading to misclassi?cation of sub-classes 
[9, 10]. In our previous work [9, 10], we identi?ed P-waves embedded in QRS-complex 
for the accurate diagnosis of EAT, and R-wave embedded in T-wave of the 
previous beat in VTach. 
The embedded waveforms are detected by area-subtraction technique [10, 20]. Area 
subtraction is based upon ?nding the mean area of each type of waveforms and sub-tracting 
the observed waveform area in the current beat from the corresponding mean. 
The calculation uses a threshold for identifying embedded waveforms [3, 10] with a 
con?dence interval [17] of 95%. After area subtraction of the initial waveform, the 
embedded P-wave or R-wave is allocated the mean amplitude and duration. 
3.2 Bivariate Gaussian Transition Graph (BGTG) Construction 
A BGTG is constructed by extracting the amplitude and duration of each of the eight 
states and transitions between them. Zero durations in waveforms or ISO-states reflect 
missing corresponding states. Embedded wave analysis is utilized to identify the absent 
edges in the Markov model. Frequency analysis is used to derive transition probability. 
Denoising 
Feature 
extraction 
Embedded 
waveform 
BGMM construction 
Denoising 
Embedded 
waveform 
detection 
First 
window 
analysis 
BGTG 
Training Phase 
Dynamic Phase 
Graph 
Matching 
Feature 
extraction 
Fig. 4. Bivariate Gaussian Markov model approach. 
576 P. R. Gawde et al.
Figure 5 shows an example of a BGTG constructed for annotated beats of the EAT 
arrhythmia in MIT-BIH [18] dataset. Table 1 shows average amplitude and durations 
obtained for the same window. Transition from ISO3 
! 
T is only 0.02 meaning T-waves 
are absent during EAT because the next depolarization (i.e. P-wave) begins before the 
repolarization [3]. In addition, ectopic foci lead to negative amplitude of P-wave. 
3.3 Graph Matching 
After constructing the BGTG, the diagnosis reduces to matching the BGTG with the 
BBGMMs for appropriate classi?cations [9, 10]. The algorithm has three steps: 
Step 1: For the constructed BGTG, most probable path (MPP) is identi?ed. An MPP is 
the path from ISO1 to ISO1 with the highest transition probability. For the BGTG 
given in Fig. 2, MPP is given by: 
ISO1!P!ISO2!Q!R!S!ISO3!ISO1. 
Step2: Transition probabilities below 0.05 are removed from BGTG to eliminate 
noise. 
The derivation of the threshold is based upon statistical analysis [17] of noise 
present in dataset [18]. A subset of the BGMMs is selected that includes all the 
transitions present in the BGTG. This step gives the list of prospective matching of 
BGMMs. 
Step 3: For all the BGMMs obtained from the Step 2, graph matching is performed by 
multiplying two values: (1) probability that observed bivariate distribution of state in 
BGTG is produced by state in BGMM using maximum likelihood estimation 
(MLE) [16] and (2) probability that the state in the observed beat is generated by a 
given BGMM based on transition probabilities using a standard forward-backward 
algorithm [15]. BGTG is classi?ed based upon the BGMM with the maximum 
likelihood. 
0.9 
8 
0.02 
ISO2 
ISO3 
S 
Q 
R 
T 
ISO1 
1 
0.04 
0.96 
1 
1 
1 
1 
P 
0.98 
0.02 
Fig. 5. A sample BGTG for 20-beat window. 
Integrating Markov Model, Bivariate Gaussian Distribution 577
4 Concurrent Model 
4.1 Dependency Analysis 
Figure 6 shows various modules and their execution time. Table 2 shows average 
processing time required for the four major modules. ECG preprocessing module has 
two submodules: denoising module and feature extraction module. 
The high-level modules cannot be executed concurrently due to the inherent 
dependency between the modules: preprocessing 
! 
embedded wave detection 
! 
BGTG construction 
! 
graph matching. However, denoising, feature extraction, 
embedded wave analysis and BGTG graph construction modules require the beat-level 
analysis and shared memory to merge the data from individual beat analysis. Graph 
matching matches one BGTG with multiple BGMMs. While ?rst three modules can 
exploit data-parallelism at the beat level within the same SM (streaming multiprocessor), 
Table 1. Average amplitude and duration 
Amplitude Duration 
P-wave --0.20 mv 0.08 s 
Q-wave --0.14 mv 0.2 s 
R-wave 1.8 mv 0.6 s 
S-wave --0.2 mv 0.1 s 
T-wave 0.17 mv 0.10 s 
ISO1 0 0.11 s 
ISO2 0 0.09 s 
ISO3 0 0.07 s 
Patient 
Specific 
Analysis 
[PERCENTAGE] 
[CATEGORY 
NAME] 
[PERCENTAGE] 
[CATEGORY 
NAME] 
[PERCENTAGE] 
[CATEGORY 
NAME] 
[PERCENTAGE] 
[CATEGORY 
NAME] 
[PERCENTAGE] 
Fig. 6. Timing analysis of bivariate Markov model approach. 
578 P. R. Gawde et al.
graph matching requires data-parallelism for concurrently matching multiple BGTG-
BGMM pairs. 
Two major issues in exploiting GPU based parallelism are: (1) mismatch of the 
latency time of different memories; and (2) mismatch between the data transfer rate 
between CPU and GPU and data transfer rate between SMs within GPUs. Thus, we 
have to optimize task distribution so that faster memory accesses in GPUs are exploited 
without excessive data transfer between slower global memories. In addition, we have 
to maintain the accuracy of the diagnosis while distributing the beats across SMs in 
GPU based on statistical analysis. In our case, the CPU performs real-time ECG 
collection and spawning of the data analysis. However, data parallel work is done in 
GPU. 
Feature extraction has two functionalities: (1) identi?cation of the waveforms; and 
(2) extraction of amplitude and duration of each waveform and ISO lines. The ?rst task 
begins without prior knowledge about the waveforms. It has eight subtasks: (1) R-wave 
extraction; (2) Q-waves extraction; (3) S-wave extraction; (4) zero crossing detection to 
get ISO2 baseline; (5) zero crossing detection to get the ISO3 base line; (6) P-wave 
extraction; (7) T-wave extraction; 8) ISO1 extraction using knowledge of P and T 
waves. There is a task dependency in identifying the beats. R-wave is identi?ed ?rst 
followed by two tasks: (Q-wave detection 
! 
zero crossing to get ISO2 
! 
P-wave 
detection) and (R-wave detection 
! 
zero crossing to get ISO3 
! 
T-wave detection). 
After the detection of P-wave and T-wave, ISO1 is identi?ed. 
4.2 Exploiting Concurrency on GPU 
The overall approach to exploit concurrency consists of three steps: (1) block level 
parallelism for noise detection and waveform extraction by dividing the data into equal 
time-slots; (2) exploiting data parallelism at the beat level for the amplitude and 
duration analysis, embedded wave detection and BGTG construction; and (3) concur-rent 
matching of BGTG-BGMM pairs by spawning multiple threads within a block, 
one for each BGTG-BGMM pair. 
Before starting concurrent processing of time-windows, the initial window for the 
?rst 30-s period is analyzed sequentially in the CPU to estimate the statistical infor-mation 
regarding the waveform features. The analyzed features are: (1) number of beats 
and individual waveforms in 30 s window; (2) mean, median, minimum and maximum 
of the amplitude and duration of each type of waveform and ISO-lines. This infor-mation 
is needed to spawn and terminate multiple threads during concurrent analysis of 
Table 2. Average processing time for modules 
Module Processing time (ms) 
Preprocessing 950 
Embedded wave 200 
Transition graph 2800 
Graph matching 3200 
Integrating Markov Model, Bivariate Gaussian Distribution 579
future windows. This information is stored in the global memory and the constant 
memory for use by SMs for subsequent concurrency exploiting modules. 
Concurrent Denoising and Feature Extraction. The noise removal submodule 
processes 30-s window (around 120 beats) of raw ECG signal, and has no knowledge 
of the waveforms. It performs convolution, low pass and high-pass ?ltering. Hence 30- 
s windows are divided equally in multiple blocks (>6 per window in our case). After 
the noise-removal, the signal is input to the feature-extraction module. Since the data is 
already present in GPU, there is no data transfer overhead. 
Beat detection and feature vector analysis are performed in one block to exploit 
shared memory (low latency). Based upon the estimate of the R-waveform counts 
derived from the initial window analysis, the same number of threads are spawned to 
concurrently detect individual R-waveforms using barrier-based synchronization 
(synchronization in Nvidia GPU terminology). After detecting R-waveforms, two sets 
of concurrent threads are spawned to detect other waveforms and features (Q-wave, 
ISO2, P-wave) and (S-wave, ISO3, T-wave) respectively. Again, the number of threads 
spawned in each set is equal to number of detected R-waves. After detecting the 
waveforms, one thread is spawned to sequentially identify all ISO1 lines in the sample. 
After feature extraction, feature data is transferred to global memory for BGTG con-struction. 
Since the data-size is quite small after feature extraction, the overhead of data 
transfer is also quite small. 
For each window, there are multiple BGTGs (around six for a 30-s window). 
Multiple blocks are spawned, one for each BGTG construction, to exploit a maximum 
number of available SMs in the GPU. For every BGTG, there are three tasks for every 
state: (1) computing averages of the durations and amplitudes for each type of 
waveform; (2) computing the joint probability of amplitude and duration; and 
(3) computing the transition probability. Eight concurrent threads are spawned: one for 
each state. This exploits data-parallelism. This gives six BGTGs for a 30-s window. 
Graph matching phase exploits data-parallelism by spawning multiple blocks, one 
for each BGTG. Each block spawns multiple threads, one for each BGTG-BGMM pair. 
Each thread utilizes the CUDA cores by automatic allocation at the OS level. 
5 Algorithms 
This section describes algorithms for the major concurrent tasks: (1) concurrent 
denoising and feature-extraction; (2) concurrent embedded-wave detection; (3) con-current 
BGTG construction; (4) concurrent MPP (most probable path) detection; and 
(5) concurrent Matching. 
For describing the concurrent thread spawning, we use the constructs cobegin-coend 
for modeling concurrent thread-groups that terminate together, barrier to model 
waiting for a group of threads to terminate together, and forall to spawn multiple 
threads concurrently. A block of activity in a single thread is enclosed within curly 
brackets {…}. Blocks are used for processing multiple concurrent activities such as 
thread-groups working on a ?nite number of beats to exploit maximum utilization of 
automated thread-groups to SMs mapping in the GPU. 
580 P. R. Gawde et al.
5.1 Concurrent Preprocessing and Embedded Wave Detection 
Algorithm for Concurrent Denoising and Feature Extraction. To execute this 
kernel function, 30 s of data is divided into number of blocks corresponding to a set of 
beats based on the average beat area calculated in the initial window analysis. On each 
block, data is divided between multiple threads. Noise removal and the R-wave 
detection with amplitude and duration is performed by the threads concurrently. 
A barrier is used to ?nish the execution of all R-wave detection threads. Next, data is 
divided into two chunks: left of R-wave (R-wave – D) and right of R-wave (R-wave 
+ D), which are spawned on multiple threads concurrently. Each of the left-side 
threads detect and extract features of one corresponding Q-wave, ISO2 and P-wave. 
Similarly, each of the right-side threads detect and extract features of one corre-sponding 
S-wave, ISO3 and T-wave. Threads are terminated after they cross their 
respective boundaries. After the termination, their output is used to detect ISO1 and 
store its duration in the global memory. 
Algorithm for Concurrent Embedded Wave Detection. A kernel function with a 
grid of six blocks is launched, where one block is executed on one SM. To execute it, 
each block gets information for average area calculation from the initial window 
analysis and features calculated for each beat in the previous module. Each thread in a 
block works on one beat. For the missing P-waves, the corresponding threshold area is 
checked to assign average features for the missing waveform. Otherwise, unchanged 
features are passed back to global memory. Detailed algorithms are given in Fig. 7. 
5.2 Concurrent BGTG Construction 
To exploit a maximum number of available resources and SMs, 120 beats were divided 
into 20 beat blocks. A BGTG is constructed by each block by using feature-values of 
20 beats and estimated values derived by initial window analysis. For each state of 
BGTG, two calculations are performed by each thread: (1) bivariate probability; and 
(2) transition probability to other states. Thread calculations are synchronized using 
barrier to ensure fully constructed BGTG before transferring data to global memory. 
Detailed algorithm is given in Fig. 8. 
5.3 Concurrent Graph Matching 
The concurrent graph matching algorithm has three kernel functions: (1) Computing 
the most probable path in each BGTG; (2) pruning BGMMs that do not have an edge 
present in the BGTG; (3) classifying BGTG using MLE and the forward-backward 
algorithm. 
Concurrent Most Probable Path. On the GPU, one grid with six blocks is deployed. 
In each block, one state of BGTG is analyzed by each thread to calculate highest 
probability for that state. Information of highest probability is stored in form of pair 
(statei, statej) representing maximum probability from stateito statej. Final thread waits 
for barrier and creates MPP by joining all state-pairs for one BGTG. 
Integrating Markov Model, Bivariate Gaussian Distribution 581
Algorithm Concurrent denoising and feature extraction 
Input: ECG signal, D6 wavelet 
Output: denoised beats with features extracted 
{ //Execute grid of blocks on GPU for window of 30 sec. 
forall block1: blockn //dispatch 5 second window to block 
{ forall threads T1:Tm { //denoising and R-wave detection 
spawn Ti for denoising and R-wave detection; 
store derived information in memory, and wait; 
end barrier;} 
count number of R-waves from memory. Let it be k;} 
Co-begin 
forall threads T1 : TK{ 
spawn Ti to detect and store Q-wave ISO2 P-wave 
store derived information in memory; 
terminate if distance > R-wave-location 
end barrier; } 
forall threads TK+1 : T2*K{ 
spawn Ti to detect and store S-wave ISO3 T-wave 
store derived information in memory; 
terminate if distance > R-wave-location 
end barrier; } 
Co-end 
calculate ISO1 based on P-wave and Q-wave; store ISO1 information}} 
Algorithm Concurrent embedded waveform detection 
Input: Beats-and-features 
Output: updated-beats-and-features 
{//Execute grid of blocks on GPU 
forall block1 : blockm //execute m concurrent blocks with 20 beats/block for multiple SMs 
forall T1 : Tm // each thread works on one beat 
if (missing (P-wave)) { 
compute QRS area 
if (QRS area > threshold) { 
mark P-wave present with average amplitude, duration 
update beats-and-features; }} 
} 
Fig. 7. Algorithm for concurrent waveform detection 
Fig. 8. Algorithm for concurrent BGTG. 
582 P. R. Gawde et al.
Concurrent BGMM Pruning. To ?nd the subset of potential BGMMs for each 
BGTG, one grid of six blocks are launched and each block is executed on different SM. 
Each block takes input of one BGTGs and all BGMMs. Comparison of one BGTG 
with one BGMM is performed by each thread on one block. BGTGs with probabilities 
less than the threshold are pruned by the ?rst thread in the block. Next, concurrent 
threads are launched for each BGTG-BGMM pair. If states in BGTG and BGMM 
match, BGMM is considered as a potential match for the BGTG and is stored in the 
common vector SUB accessible to all the threads in the block. 
Concurrent Maximum Probability. To calculate the probabilities of matching each 
BGTG with the ?ltered BGMMs, a kernel function with a grid of six blocks is laun-ched. 
One BGTG is matched with the subset of ?ltered BGMMs in one block. The 
probability of matching one BGTG-BGMM pair is calculated by multiplying two 
values: (1) probability of state-value (bivariate Gaussian distribution) of BGTG pro-duced 
by BGMM using MLE [16], and (2) probability of transition probabilities in 
BGTG produced by BGMM using a forward-backward algorithm [15]. This probability 
is stored in a vector accessible to all the threads in the block. The outputs for each block 
are transferred back to the global memory. Detailed algorithm is given in Fig. 9. 
6 Implementation 
The software was executed on a Dell machine having Intel(R) Xeon(R) dual core CPU 
E5-2680 @2.70 GHz 64-bit system with 128 GB RAM and CUDA enabled 
GeForce GTX 1050 ti GPU card. In GTX 1050 ti, there are six SMs. Each SM has four 
blocks with 32 cores per block, and 48 KB shared memory. There are 24 blocks, each 
having 1024 threads. There are total of 768 cores in the GPU for SIMT processing. 
We analyzed the MIT-BIH arrhythmia dataset [18] and the Creighton University 
Ventricular Database available at PhysioNet [21]. The dataset was divided 60% for 
training and 40% for testing. Threshold used for area subtraction in algorithm for 
detection of the embedded waveforms was chosen experimentally after analyzing 3093 
beats in MIT-BIH [18]. To derive the execution ef?ciency, we compared the CPU only 
implementation and CPU + GPU combination with full CUDA resources. For the 
acquisition of real-time ECG data, signal ?ltering and processing, feature extraction 
and analysis, we use MATLAB software along with WFDB software package [21] 
provided by PhysioNet written in C++ [21]. We also used MATLAB for statistical 
analysis. GPU algorithms were executed in C with CUDA framework. 
6.1 Performance Analysis and Discussion 
We tested overall execution ef?ciency and improvement using single core CPU and 
768 CUDA cores at the module level as summarized in Table 3. We also tested the 
effect of memory utilization of different types of memory on overall improvement as 
shown in Fig. 10. Based on limitations and advantages of each memory type, we 
analyzed two approaches to exploit data parallelism: (1) Combination of constant 
Integrating Markov Model, Bivariate Gaussian Distribution 583
Fig. 9. Algorithm for concurrent graph matching 
Table 3. Concurrent execution speedup of modules 
Module Single CPU Concurrency using GPU Speedup 
Preprocessing 950 ms 503 ms 1.8 
Embedded wave 200 ms 102 ms 1.9 
Transition graph 2800 ms 489 ms 5.7 
Graph matching 3200 ms 492 ms 6.5 
Total time 7150 ms 1586 4.5 
584 P. R. Gawde et al.
memory and global memory; (2) Combination of shared memory and constant 
memory. 
The execution times of different modules are based on the analysis of 120 beats per 
execution for 500 iterations. Average time taken to execute the sequential BGMM 
approach on CPU is around 7 s. Average time taken to execute the modules concur-rently 
using the GPU is around 1.6 s. The overall improvement is 4.5x (77.8%) for 
arrhythmia subclassi?cation. After the GPU implementation ?nishes in 1.6 s, it remains 
idle for the next 28.4 s while CPU collects next 30-s window in real-time. This idle 
time can be utilized to analyze other abnormalities working on same ECG data [19]. 
In the ?rst approach of memory utilization, constant memory was used for multiple 
access of read-only data. Due to the read-only nature of constant memory, during 
concurrent preprocessing of modules, global memory was used for information 
exchange and storing dynamic data during concurrent preprocessing of modules. In the 
second approach, faster shared memory within a single block was used as a read/write 
memory during dynamic execution. However, due to its limited size, read-only mul-tiple 
access data was stored in constant memory [19]. 
The experiment was run only for 20 beats/BGTG. The sequential execution 
increased linearly as the number of BGTGs increased. The time saved with the com-bination 
of shared memory and constant memory was more than the time saved using a 
combination of constant memory and global memory. This difference is expected 
because the shared memory is a cache memory with a low latency period. One more 
interesting result was observed. The concurrent approach increased linearly up to six 
BGTGs. After six BGTGs, the execution time became constant possibly due to addi-tional 
automated allocation of CUDA resources or SMs. Thus, additional overloading 
of the GPU is automatically compensated by additional allocation of CUDA resources. 
This might prove a useful feature for exploring an analysis of other aspects of ECG 
abnormalities without increasing execution time. 
6.2 Classi?cation Accuracy 
We calculated false positives, false negatives, true positives and true negatives to 
compute sensitivity as (TP/(TP + FN) * 100) and speci?city as (TN/(TN + FP) * 100). 
Fig. 10. Effect of memory utilization in speed up. 
Integrating Markov Model, Bivariate Gaussian Distribution 585
True positives (TP) are the number of positive detections that correspond to the anno-tations 
of a specialist. False positives (FP) are the number of detections that do not 
correspond with the annotations of a specialist. True negatives (TN) are the beats that 
were not annotated as a ventricular arrhythmia beat by a physician, and were not 
identi?ed by the algorithm. False negatives (FN) are the heartbeats that were annotated 
as arrhythmia by a specialist, but were not detected by the algorithm. Table 4 shows the 
accuracy of our technique when using a combination of shared memory and constant 
memory. Both sensitivity and speci?city are high for all the subclasses. 
7 Related Works 
Several researchers have exploited parallel techniques for various subtasks such as 
processing of the signal using ?lters [12], wavelet transform [13], and classi?cation of 
beat to supraventricular and ventricular arrhythmia [14]. 
Lopes et al. [22] have proposed ventricular arrhythmia diagnosis using parallel 
implementation of neural networks. Their approach focuses on parallel implementation 
of back propagation. Their technique is limited to PVC beat detection, and does not 
address real-time classi?cation. Their approach does not detect embedded P-waves 
reducing the accuracy. The sensitivity obtained using their approach is 94.5% [17] 
compared to 98.8% obtained using our approach. 
Another neural network-based classi?cation approach has been proposed by Li 
[11]. It is limited to separating supraventricular or ventricular beat using GPUs. 
Phaudphut and Phusomsai obtained sensitivity of 88.0% [13] in detection of PVC beat 
compared to 99.3% by our approach. In addition, we diagnose all seven major sub-classes 
in real-time. 
Some researchers have utilized the GPU for denoising and feature extraction [8, 
12]. Domazet et al. [12] has proposed an optimization with shared and constant 
memory for DSP ?lter for ECG denoising. Although our goal is much broader, we 
tested our approach with two memory optimization techniques. Combination of shared 
memory and constant memory showed improvements due to low latency compared 
with a combination of global and constant memory as expected. 
Table 4. Accuracy of arrhythmia subclassi?cation 
Class Subclass Sensitivity Speci?city Sensitivity Speci?city 
Sequential approach GPU based concurrent 
approach 
Ventricular AFib 97.3 93.6 97.2 96.4 
Ventricular AFlu 92.3 94.3 92.1 94.3 
Ventricular AVNRT 95.2 96.9 95.6 97.0 
Ventricular EAT 98.6 94.0 98.6 94.0 
Supraventricular VTach 94.0 96.3 94.0 96.4 
Supraventricular VFlu 91.3 98.2 91.6 98.3 
Supraventricular VFib 98.6 99.6 98.5 99.1 
586 P. R. Gawde et al.
8 Limitations and Future Directions 
The current system uses only lead II for arrhythmia analysis. The model could be 
extended further to analyze three leads signals on embedded GPU such as NVIDIA 
Jetson [5] based wearable devices to handle ischemia, heart abnormalities due to 
electrolyte imbalances and myocardial infarction in real-time. We are currently 
extending our GPU based BGMM to a GPU based multivariate model to diagnose 
ischemia, hyperkalemia and myocardial infarction using three leads in real-time. 
References 
1. Lerma, C., Glass, L.: Predicting the risk of sudden cardiac death. J. Physiol. 594(9), 2445– 
2458 (2016) 
2. Rautaharju, P.M., Surawicz, B., Gettes, L.S.: AHA/ACCF/HRS recommendations for the 
standardization and interpretation of the electrocardiogram: part IV. J. Am. Coll. Cardiol. 53 
(11), 982–991 (2009) 
3. Garcia, T.B., Miller, G.T.: Arrhythmia Recognition: The Art of Interpretation. Jones and 
Bartlett, Burlington (2004) 
4. Abtahi, F., Snäll, J., Aslamy, B., Abtahi, S., Seoane, F., Lindecrantz, K.: Biosignal pi, an 
affordable open-source ECG and respiration measurement system. Sensors 15(1), 93–109 
(2014) 
5. Page, A., Attaran, N., Shea, C., Homayoun, H., Mohsenin, T.: Low-power manycore 
accelerator for personalized biomedical applications. In: ACM Proceedings of the 26th 
Edition on Great Lakes Symposium on VLSI, Boston, pp. 63–68 (2016) 
6. Mahmoodabadi, S.Z., Ahmadian, A., Abolhasani, M.D.: ECG feature extraction using 
Daubechies wavelets. In: Proceedings of the Fifth IASTED International Conference on 
Visualization, Imaging and Image Processing, Benidorm, pp. 343–348(2005) 
7. Sayadi, O., Mohammad, B., Shamsollahi, M.B., Clifford, G.D.: Robust detection of 
premature ventricular contractions using a wave-based Bayesian framework. IEEE Trans. 
Biomed. Eng. 57(2), 353–362 (2010) 
8. Jun, T.J., Park, H.J., Yoo, H., Kim, Y.H., Kim, D.: GPU based cloud system for high-performance 
arrhythmia detection with parallel k-NN algorithm. In: Proceedings of the 38th 
Annual International Conference of the. IEEE Engineering in Medicine and Biology Society 
(EMBC), Orlando, pp. 5327–5330 (2016) 
9. Gawde, P.R., Bansal, A. K., Nielson, J.A.: ECG analysis for automated diagnosis of 
subclasses of supraventricular arrhythmia. In: Proceedings of International Conference on 
Health Informatics and Medical Systems, Las Vegas, pp. 10–16 (2015) 
10. Gawde, P.R., Bansal, A.K., Nielson, J.A.: Integrating Markov model and morphology 
analysis for ?ner classi?cation of ventricular arrhythmia in real-time. In: IEEE International 
Conference on Biomedical & Health Informatics, Orlando, pp. 409–412 (2017) 
11. Li, P., Wang, Y., He, J., Wang, L., Tian, Y., Zhou, T.: High-performance personalized 
heartbeat classi?cation model for long-term ECG signal. IEEE Trans. Biomed. Eng. 64(1), 
78–86 (2017) 
12. Domazet, E., Gusev, M., Ristov, S.: Optimizing high performance CUDA DSP ?lter for 
ECG signals. In: Proceedings of the 27th DAAAM International Symposium in Intelligent 
Manufacturing and Automation, Vienna, pp. 0623–0632 (2016) 
Integrating Markov Model, Bivariate Gaussian Distribution 587
13. Phaudphut, C, So-In, C., Phusomsai. W.: A parallel probabilistic neural network ECG 
recognition architecture over GPU platforms. In: Proceedings of the 13th International Joint 
Conference on. Computer Science and Software Engineering (JCSSE), Khon Kaen, pp. 1–7 
(2016) 
14. Fan, X., He, C., Chen, R., Li, Y.: Toward automated analysis of electrocardiogram big data 
by graphics processing unit for mobile health application. IEEE Access 5, 17136–17148 
(2017) 
15. Russell, S., Norwig, P.: Arti?cial Intelligence—A Modern Approach, 3rd edn. Prentice Hall, 
Upper Saddle River (2010) 
16. Psutka, J.V., Psutka J.: Sample size for maximum likelihood estimates of Gaussian model. 
In: International Conference on Computer Analysis of Images and Patterns, pp. 462–469. 
Springer, Cham (2015) 
17. Everitt, B., Skrondal, A.: The Cambridge Dictionary of Statistics, vol. 106. Cambridge 
University Press, Cambridge (2002) 
18. MIT-BIH Arrhythmia dataset. https://www.physionet.org/physiobank/database/MIT-BIH/ 
19. Nvidia, C.: C Programming Guide PG-02829–001_v9.1, March 2018. http://docs.nvidia. 
com/cuda/pdf/CUDA_C_PrograBGMMing_Guide.pdf 
20. Tallarida, R.J., Murray, R.B.: Area Under a Curve: Trapezoidal and Simpson’s Rules 
Manual of Pharmacologic Calculations, pp. 77–81. Springer, New York (1987) 
21. Creighton University Ventricular Tachyarrhythmia Database. https://physionet.org/ 
physiobank/database/cudb/ 
22. Lopes, N., Ribeiro, B.: Fast pattern classi?cation of ventricular arrhythmias using graphics 
processing units. In: Iberoamerican Congress on Pattern Recognition. LNCS, vol. 5856, 
pp. 603–610. Springer, Heidelberg (2009) 
588 P. R. Gawde et al.
Identification of Glioma from MR Images 
Using Convolutional Neural Network 
Nidhi Saxena(B) , Rochan Sharma, Karishma Joshi, and Hukum Singh Rana 
University of Petroleum and Energy Studies, Dehradun, India 
nsaxena117@gmail.com 
Abstract. This paper presents a novel approach of classifying the type 
of glioma using convolutional neural network (CNN) on 2D MR images. 
Glioma, most common type of malignant brain tumor, and can be clas-si?ed 
according to the type of glial cells a?ected. The types of gliomas 
are, namely, actrocytoma, oligodendroglioma and glioblastoma multi-forme 
(GBM). Various image processing and pattern recognition tech-niques 
may be used for cancer identi?cation and classi?cation. Though 
in recent years deep learning has been proved to be e?cient in computer 
aided diagnosis of diseases. Convolutional Neural Networks, a type of 
deep neural network which is generally used for classi?cation of images, 
contains multiple sets of conv-pool layers for feature extraction, followed 
by fully-connected (FC) layers that make use of extracted features for 
classi?cation. 
Keywords: Glioma 
·
Astrocytoma 
·
Oligodendroglioma 
Glioblastoma multiforme (GBM) 
MRI and convolutional neural network (CNN) 
1 Introduction 
Glioma is a major type of brain tumor that can occur in all age groups though 
mostly seen in adults. It originates in glial cells of brain. Glial cells are of four 
types namely - astrocytes, oligodendrocytes, microglia and ependymal cells. 
Accordingly astrocytoma, oligodendroglioma and glioblastoma multiforme are 
the types of glioma cancers as shown in Fig. 1. These tumors can be cured if 
detected at early stage but some of the fast growing gliomas can be dangerous. 
The most common and aggressive type of brain tumor is glioblastoma multiforme 
or GBM, which is a malignant grade IV glioma. In early-stage glioblastoma, as 
per MRI ?ndings, are ill-de?ned small lesions with little or no mass e?ect, and 
having no or subtle contrast enhancement. Within several months, these lesions 
develop typical MRI ?ndings such as a heterogeneous enhanced bulky mass with 
central necrosis. The average period from the initial to ?nal scan in diagnosis 
of glioblastoma has been 4.5 months [1]. Magnetic Resonance Imaging (MRI) is 
one of the commonly used modalities used for diagnosing brain tumors. As com-pared 
to other diagnostic methods, like computed tomography scan, ultrasound, 
.o
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 589–597, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_44
590 N. Saxena et al. 
etc., MRIs are safe, non-invasive and re?ects the true dimensions of organ/tissue, 
therefore in imaging of the brain, it is widely considerable [2]. 
Convolutional neural networks (CNN) [3] consists of conv-pool layers followed 
by fully-connected (FC) layers. One conv-pool layer consists of a convolutional 
layer and a pooling layer. The convolutional layer is used to detect hierarchical 
features from images, whereas pooling layer is used to forward the detected fea-tures 
further in the network [4]. In the proposed model, convolution operations 
are performed with same padding (size of feature space remains same) and pool-ing 
is performed with valid (or no) padding (size of feature space is reduced). 
Conv-pool layers detects useful features and forward them to FCs where classi?- 
cation is performed. Unlike neural networks, in CNN each layer grid is connected 
to only a limited number of layers. In CNN, the entire network can be put into 
the GPU memory and the hardware cores can be used to boost network speed 
using deep learning tools. They have a lot of applications in medical diagnosis 
involving image segmentation independent of morphology. Lesions are detected 
and classi?ed accordingly using CNNs and type and severity of diseases can be 
predicted. 
Fig. 1. MR scans of types of gliomas: a. Astrocytoma, b. Olidendroglioma and c. 
Glioblastoma multiform or GBM. 
2 Literature Review 
2.1 Segmentation 
In medical domain, segmentation is the technique for detection and separation 
of a part from medical image (can be a lesion or an organ) that can be used for 
further diagnosis. Segmentation proves to be very helpful for monitoring disease 
progression, plan treatment strategies and prediction of treatment outcomes. It 
can be done in many ways like by thresholding or by developing a heuristic 
algorithm as shown by Rajnikanth et al. [5]. Their work focuses on developing
Hamiltonian Mechanics 591 
a heuristic algorithm to segment the tumor region from 2D brain MRI images. 
Initially, preprocessing is done which enhances the tumor region in MR scans 
followed by multi-level thresholding to segment the lesion. Then accuracy is 
calculated on di?erent slices of MR images, which is above 95% for all types of 
MRI slices. 
Deep learning can also be applied for segmentation of lesion and detection of 
cancer from modalities like Computed Tomography (CT) scan, ultrasound and 
MRI. Farnaz et al. [6], trained a deep convolutional neural network (DCNN) for 
segmentation of lesions in brain from MR images. The proposed model was 6 
layers deep (5 convolution layers and 1 FC layer) showed that the DICE similarity 
coe?cient matric was 0.90 for complete, 0.85 for core and 0.84 for enhancing 
regions on BRATS 2016 dataset. 
Segmentation, sometimes may need humans to provide some high level infor-mation 
needed to extract the segmented region from images. This type of 
segmentation is called interactive segmentation [7]. Wang et al. [8] performed 
interactive medical image segmentation by ?ne-tuning a pre-trained CNN for 
segmenting multiple organs from 2D fetal MR slices (here two types of organs 
were annotated for training) and also on 3D segmentation of brain tumor core 
and whole brain tumor (here the brain tumor core was annotated in one MR 
sequence). The image speci?c ?ne-tuning made CNN model adaptive to a speci?c 
test image which can be either unsupervised or supervised. Also, a weighted loss 
function considering network and interaction based uncertainty for ?ne-tuning 
was proposed. Experiments show that image speci?c ?ne tuning improves seg-mentation 
performance. 
2.2 Classification 
In medical diagnosis, aim is to identify the presence of a disease in a person on the 
basis of scans of a particular organ along with analyzing patient’s medical history. 
To detect the disease by analyzing an image, pre-processing may prove to be 
bene?cial. Sadegi-Naini et al. [9] proposed a method for feature extraction (a pre-processing 
step) and data analysis to characterize breast lesion by using texture 
based features in ultrasound scans. Among 78 patients, 46 and 32 patients were 
con?rmed with benign and malignant lesions respectively based on radiology 
and pathology reports. 
Though MR is an e?cient modality, still to apply Computer Aided Diagnosis 
(CAD) sometimes pre-processing methods such as feature selection, extraction or 
representation is required. Mingxia et al. [10] proposed an anatomical landmark 
based feature representation which automatically extracts features in brain MR 
images for the purpose of disease diagnosis. Experimental results showed that 
the proposed method improves the performance of disease classi?cation. 
An approach to ?nd the severity of tumor is to ?rst segment tumor region 
from the scan then classify it as malignant or benign. Deckota et al. [11] proposed 
a system which identi?es the cancerous nodule from the lung CT scan images 
using watershed segmentation for detection and support vector machine (SVM) 
for classi?cation of nodule as malignant or benign. The proposed model includes
592 N. Saxena et al. 
6 stages: image pre-processing, segmentation of the pre-processed image, feature 
extraction, feature reduction using PCA, classi?cation using SVM and evaluation 
of the classi?cation. The model detects cancer with 92% accuracy classi?er has 
accuracy of 86.6%. 
In a classi?cation problem of medical diagnosis, accuracy is generally mea-sured 
in terms of speci?city and sensitivity and both are directly proportional 
to the accuracy of the classi?er. Blumenthal et al. [12] proposed an automatic 
classi?cation for tumor and nontumor cells using support vector machine (SVM) 
classi?er which is trained on 4 components enhancing and nonenhancing, tumor 
and nontumor. Classi?cation results were evaluated using 2 fold cross validation 
analysis of the training set and MR spectroscopy. High sensitivity and speci?city 
(100%) were obtained within the enhancing and nonenhancing areas. 
Zakarakhi et al. [13] also proposed a scheme to classify brain tumor type and 
grade using MR images. The proposed scheme consists of several steps includ-ing 
ROI de?nition, feature extraction, feature selection and classi?cation. The 
extracted features include tumor shape and intensity characteristic as well as 
rotation invariant texture features. Feature subset selection is performed using 
SVM with recursive feature elimination. The binary SVM classi?cation accuracy, 
sensitivity and speci?city were respectively 85%, 87% and 79% for discrimina-tion 
of metastases from gliomas and 88%, 85% and 96% for discrimination of 
high-grade from low-grade neoplasms. 
Deep learning can be used e?ciently for identi?cation of di?erent types of 
substances in organ scans, as shown by Fang Liu et al. [14] as they designed a 
deep Magnetic Resonance Attenuation Correction (MRAC) for classi?cation of 
air, bone and soft tissue in CT scans of various organs. Their method provided 
an accurate pseudo CT scan with a mean Dice coe?cient of 0.971 ± 0.005 for 
air, 0.936 ± 0.011 for soft tissue and 0.803 ± 0.021 for bone. 
Most common application of deep learning is detecting whether a person is 
having a particular disease (mostly cancer) or not. David et al. [15] proposed 
a skin cancer prediction model using ANN (arti?cial neural network) whose 
training sensitivity was 88.5% and speci?city was 62.2% for the prediction of 
non-melanoma skin cancer (NMSC). The validation set showed a sensitivity of 
86.2% and speci?city of 62.7%. Vipin et al. [16] used a deep neural network 
architecture for detection of tumors in lung CT scans and brain MR images. 
They basically classi?ed the images as being tumorous or non-tumorous. The 
accuracy of classi?cation was more than 97% for both CT and MR images. 
Frameworks AlexNet and ZFNet are compared for the same purpose. 
3 Method 
3.1 Implementation Details 
Dataset used is REMBRANDT [17,18] which consists of MR scans of 130 
patients su?ering from glioma tumors of di?erent types and at di?erent stages. 
From this dataset, a total of 38,952 images, each of 128 × 128 were used. 5- 
fold cross-validation is applied with batch-size of 512 images. The label 0 was
Hamiltonian Mechanics 593 
Fig. 2. Architecture of proposed CNN. 
astrocytoma, 1 was GBM and 2 was oligodendroglioma. Test split was 4096 
images from a total of 38,952 images. For training, 45 epochs are used for each 
validation. The proposed CNN model is implemented using tensor?ow frame-work 
on a system with con?guration as 4 CPU, 15 GB with 2 NVIDIA K80 
GPUs on ubuntu 16.05. 
3.2 Convolutional Neural Network 
In neural networks, there was a need to provide features to the network for 
classi?cation. CNNs are special type of neural networks where earlier layers are 
used to extract features and later layers are used to perform classi?cation using 
the extracted features. In general, initial layers of CNN comprises of multiple 
conv-pool layers followed by FCs. The last layer or output layer is either a 
sigmoid layer (in case of binary classi?cation) or a sigmoid layer. CNNs have 
proved to be very e?ective for extracting features from images, and eliminates 
the need of providing hand-crafted features to the network. Though training of 
CNN are computationally expensive, but use of GPU can fasten the process. 
Deeper the network, greater the classi?cation power due to the additional 
non-linearities and better quality of local optima [19]. However convolutions 
with 3D kernels are computationally expensive in comparison to the 2D kernel, 
which hampers the addition of more layers. Thus deeper network variants that 
are implicitly regularized and more e?cient networks can be designed by simply 
replacing each layer of common architectures with more layers that use smaller 
kernels [20]. 
However deeper networks are more di?cult to train. It has been shown that 
the forward (neuron activations) and backward (gradients) propagated signals
594 N. Saxena et al. 
may explode or vanish if care is not given to retain its variance [21]. This problem 
of vanishing gradients is solved by using adam optimizer described below. 
Adam Optimizer: Adam, derived from adaptive moment estimation [22], is 
an optimization algorithm used to solve the problem of vanishing gradients and 
elps in achieving learning rate decay. It uses the ?rst moment (which involves the 
exponentially decaying average of the previous gradients) and second moments 
(which involves exponentially decaying average of previous squared gradients). 
Adam is generally regarded as being fairly robust to the choice of hyper parame-ters, 
though the learning rate sometimes needs to be changed from the suggested 
default. 
Batch Normalization: In deep CNNs, each layer gets di?erent inputs or acti-vations 
which may result in inputs belonging to di?erent distributions at di?erent 
layers. This problem of internal covariant shift [23] is solved by applying batch 
normalization. It means, inputs at each layer are normalized so that they are all 
on same scale and hence belong to same distribution. Thus, batch normalization 
increases the adaptiveness of later layers learning. The batch normalization is 
applied in all the layers in the proposed architecture. 
Architecture: The CNN model (as shown in Fig. 2) is developed for 128 × 128 
grayscale 2D MR images, having 5 conv-pool layers and 4 fully-connected (FC) 
layers. All the conv-pool layers used 3 × 3 kernel size and stride as 1 and max-pool 
layers used 2 × 2 kernel size and stride as 2. Activation function used in 
all the layers is ReLu (Recti?ed Linear Unit). First layer used two dimensional 
32 kernels followed by max-pool of stride and same padding. Second layer used 
64 kernels, followed by 128 kernels in the third layer, 256 kernels in the forth 
convolutional layer and ?nally 512 kernels in last convolutional layer (as shown 
in Fig. 4). Batch normalization is performed in all conv-pool and FC layers. 
After ?ve layers, there are 8192 features which are then ?attened in FC6 and 
converged to 2048 features in FC7, followed by 512 in FC8, then 64 in FC9 and 
?nally to 3, which is the total number of classes. In last FC layer softmax, which 
is a probabilistic activation, is applied as classi?cation is done for three classes, 
which are three di?erent types of gliomas. 
3.3 Results 
The cost minimization in one validation set is shown in Fig. 3. The proposed 
model is executed with 5-fold cross-validation and the overall cost minimization 
is shown in Fig. 4. The reason for observed ?uctuations in Fig. 4 is the applied 
validation. The cost in last epoch of one validation set is much lower than the 
cost at ?rst epoch while training the next validation set. The model gives a 
training accuracy of 63.17%, validation accuracy of 56.67% and test accuracy of 
65.24%.
Hamiltonian Mechanics 595 
Fig. 3. Cost plot. 
Fig. 4. Cost when validation is applied. 
4 Conclusion 
This paper proposes a novel CNN based model for identi?cation of glioma based 
on their origin in brain. To the best of our knowledge, this is the ?rst time deep 
learning is applied for identi?cation of glioma. The most common brain tumor is 
gliblastoma multiforme which can be classi?ed using the proposed model. GBM 
is grade IV malingnant tumor. One shortcoming of the proposed model is that 
some astrocytomas and oligodendrocytomas are misidenti?ed as GBM. In future, 
with further improvement, this model may assist radiologists to predict the type 
of glioma a person is su?ering from and treatment can be given accordingly.
596 N. Saxena et al. 
5 Future Scope 
Grade determines the severity of the disease. As mentioned, glioma has four 
grades. Grade IV is the most malignant stage and is also called gliblastoma 
multiforme or just high grade glioma. This paper detects the type of glioma 
from MR images using CNN. Further, a di?erent CNN architecture can be used 
for the detection of grade of glioma. 
References 
1. Ideguchi, M., Kajiwara, K., Goto, H., Sugimoto, K., Nomura, S., Ikeda, E., Suzuki, 
M.: MRI ?ndings and pathological features in early-stage glioblastoma. J. Neu-roOncol. 
123, 289–297 (2015) 
2. El-Gamal, F., Elmogy, M., Atwan, A.: Current trends in medical image registration 
and fusion. Egypt. Inform. J. 17, 99–124 (2016). https://doi.org/10.1016/j.eij.2015. 
09.002 
3. Lecun, Y., Bottou, L., Bengio, Y., Ha?ner, P.: Gradient-based learning applied to 
document recognition. Proc. IEEE 86, 2278–2324 (1998) 
4. Zeiler, M., Fergus, R.: Visualizing and understanding convolutional networks. In: 
European Conference on Computer Vision, pp. 818–833 (2014) 
5. Rajnikanth, V., Fernandes, S., Bhushan, B., Sunder, N.: Segmentation and anal-ysis 
of brain tumor using tsallis entropy and regularised level set. In: 2nd Inter-national 
Conference on Micro-Electronics, Electromagnetics and Telecommunica-tions. 
Springer, Singapore (2018) 
6. Hoseini, F., Shahbahrami, A., Bayat, P.: An e?cient implementation of deep con-volutional 
neural networks for MRI segmentation. J. Digit. Imaging 31, 738 (2018) 
7. McGuinness, K., O’Connor, N.: A comparative evaluation of interactive segmen-tation 
algorithms. Pattern Recognit. 43, 434–444 (2010) 
8. Wang, G., Li, W., Zuluaga, M., Pratt, R., Patel, P., Aertsen, M., Doel, T., David, 
A., Deprest, J., Ourselin, S., Vercauteren, T.: Interactive medical image segmenta-tion 
using deep learning with image-speci?c ?ne-tuning. IEEE Trans. Med. Imag-ing. 
37, 1562 (2018) 
9. Sadeghi-Naini, A., Suraweera, H., Tran, W., Hadizad, F., Bruni, G., Rastegar, R., 
Curpen, B., Czarnota, G.: Breast-lesion characterization using textural features of 
quantitative ultrasound parametric maps. Sci. Rep. 7, 13638 (2017) 
10. Liu, M., Zhang, J., Nie, D., Yap, P., Shen, D.: Anatomical landmark based deep 
feature representation for MR images in brain disease diagnosis. IEEE J. Biomed. 
Health Inform. 22, 1476 (2018) 
11. Devkota, B., Alsadoon, A., Prasad, P., Singh, A., Elchouemi, A.: Image segmen-tation 
for early stage brain tumor detection using mathematical morphological 
reconstruction. Procedia Comput. Sci. 125, 115–123 (2018) 
12. Blumenthal, D., Artzi, M., Liberman, G., Bokstein, F., Aizenstein, O., Ben Bashat, 
D.: Classi?cation of high-grade glioma into tumor and nontumor components using 
support vector machine. Am. J. Neuroradiol. 38, 908–914 (2017) 
13. Zacharaki, E., Wang, S., Chawla, S., Soo Yoo, D., Wolf, R., Melhem, E., Davatzikos, 
C.: Classi?cation of brain tumor type and grade using MRI texture and shape in 
a machine learning scheme. Magn. Reson. Med. 62, 1609–1618 (2009)
Hamiltonian Mechanics 597 
14. Liu, F., Jang, H., Kijowski, R., Bradshaw, T., McMillan, A.: Deep learning MR 
imaging-based attenuation correction for PET/MR imaging. Radiology 286, 676– 
684 (2017) 
15. Ro?man, D., Hart, G., Girardi, M., Ko, C., Deng, J.: Predicting non-melanoma 
skin cancer via a multi-parameterized arti?cial neural network. Sci. Rep. 8, 1701 
(2018) 
16. Makde, V., Bhavsar, J., Jain, S., Sharma, P.: Deep neural network based classi?ca-tion 
of tumourous and non-tumorous medical images. In: International Conference 
on Information and Communication Technology for Intelligent Systems, pp. 199– 
206 (2017) 
17. Scarpace, L., Flanders, A.E., Jain, R., Mikkelsen, T., Andrews, D.W.: Data From 
REMBRANDT. The Cancer Imaging Archive (2017) 
18. Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., 
Phillips, S., Ma?tt, D., Pringle, M., Tarbox, L., Prior, F.: The cancer imaging 
archive (TCIA): maintaining and operating a public information repository. J. 
Digit. Imaging 26, 1045–1057 (2013) 
19. Choromanska, A., Hena?, M., Mathieu, M., Arous, G., LeCun, Y.: The loss surfaces 
of multilayer networks. In: Arti?cial Intelligence and Statistic, pp. 192–204 (2015) 
20. Kamnitsas, K., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Rueck-ert, 
D., Glocker, B.: E?cient multi-scale 3D CNN with fully connected CRF for 
accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017). https:// 
doi.org/10.1016/j.media.2016.10.004 
21. Glorot, X., Bengio, Y.: Understanding the di?culty of training deep feedforward 
neural networks. In: Proceedings of the Thirteenth International Conference on 
Arti?cial Intelligence and Statistics, pp. 249–256 (2010) 
22. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 
(2014) 
23. Io?e, S., Szegedy, C.: Batch normalization: accelerating deep network training by 
reducing internal covariate shift. arXiv:1502.03167 (2015)
Array of Things for Smart Health Solutions Injury 
Prevention, Performance Enhancement 
and Rehabilitation 
S. M. N. Arosha Senanayake1,2(?) , Siti Asmah @ Khairiyah Binti Haji Raub2 , 
Abdul Ghani Naim1,2 , and David Chieng3 
1 
Institute of Applied Data Analytics, University of Brunei Darussalam, Gadong BE1410, Brunei 
arosha.senanayake@ubd.edu.bn 
2 
Faculty of Science, University of Brunei Darussalam, Gadong BE1410, Brunei 
3 
Wireless Innovation, MIMOS Berhard, Technology Park Malaysia, Kuala Lumpur, Malaysia 
Abstract. Data visualization on wearable devices using cloud servers can 
provide solutions for personalized healthcare monitoring of general public 
leading to smart nation. The objective of this research is to develop personalized 
healthcare IoT assistive devices/tools for injury prevention, performance 
enhancement and rehabilitation using an Intelligent User Interfacing System. It 
consists of Array of Things (AoT) which interconnects hybrid prototypes built 
using di?erent wearable measurement and instrumentations multimodel sensor 
system for transient and actual health status and classi?cation. Android platforms 
have been used to prove the success of AoT using national athletes and soldiers 
with whom were permitted the implementation of a knowledge base encapsulated 
reference/benchmarking massive retrieve, retain, reuse and revise health pattern 
sets accessible via case base reasoning cloud storage. Two case studies were 
conducted for injury prevention and rehabilitation and performance enhancement 
of soldiers and athletes using smart health algorithms. Validation and testing were 
carried out using Samsung Gear S3 smart watches in real time. 
Keywords: Array of Things (AoT) · Personalize healthcare 
Multimodel sensor system · Transient health · Smart health 
1 Introduction 
Array of Things concept was ?rstly introduced in Smart Chicago project [1]. Their 
concept was the designing of range of cyber physical devices as measurement and 
instrumentation systems at urban scale based on the principle of array of telescopes and 
IoT. In [2], authors summarize Parkinson Disease (PD) patients monitoring in the home 
setting using wearable and ambient sensors. The technology includes a wireless unit 
strapped around the wrist, Band-Aid-like sensors attached to the lower limbs, a wearable 
camera worn as a pendant, a smart watch, and a mobile phone clipped on the belt used 
as gateway to relay the data to the cloud to assess speci?c functions (using its embedded 
sensors) as well as to communicate with the patient (using customized apps). The inte- 
gration of wearable technology with smart devices enables the remote monitoring of 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 598–615, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_45
patients with PD and real-time feedback to clinicians, family/caregivers, and the patients 
themselves. 
Three Machine Learning (ML) algorithms were proposed to generate knee angle 
patterns in sagittal plane, which is one of the joints used during the walk. The Extreme 
Learning Machine algorithm outperformed against Arti?cial Neural Network and Multi-output 
Support Vector algorithms and can generate a speci?c reference of normal knee 
pattern depending on individual’s characteristics and walking speed. This speci?c refer- 
ence provides a personalized gait analysis [4]. 
Having done extensive research work on applying virtual measurement and instru- 
mentation for human motion analysis during past two decades [5–11], this paper intro- 
duces generalized frame work for data visualization on wearable devices for personal- 
ized healthcare using wearable sensors and its data fusion; Array of Things for smart 
health solutions, as illustrated in Fig. 1. 
Fig. 1. System overview of Array of Things for smart health solutions. 
Smart health solution architecture consists of wearable devices for personalized 
healthcare services and technologies and cloud server technologies in order to visualize 
smart health data fused and update, repair and remove transient health data based on 
actual health status using personalized wrist band data center. Thus, this paper is struc- 
tured from general system architecture introduced leading to speci?c application 
domains used in order to prove its services. Smart health solution architecture is articu- 
lated using Hybrid System Architecture Platform (HSAP) that is the novel platform for 
Array of Things (AoT) devices/tools composed of a set of cloud computing based sensor, 
processing, control, and data services integrating AoT and cloud computing into a single 
framework 
Thus, this article describes the HSAP system architecture in detail using its core 
components; smart data fusion, smart data analytics and deep learning. HSAP allows to 
acquire personalize health pattern set using wearable devices which requires multimodel 
sensory mechanisms to extract feature set, integrate feature set and transform it using 
Array of Things for Smart Health Solutions Injury Prevention 599
data fusion techniques such a way that knowledge base (KB) of an individual person is 
formed. Formed KB consists of pre-injury (healthy) pattern set, injury pattern set and 
post-injury pattern set which will be updated using personalized wrist band data center 
primarily using worn IoTs. Virtual measurements and instrumentation technologies 
(LabVIEW) is used as the platform to interface AoTs connected to cloud server by 
implementing Intelligent Graphical User Interfacing System (IGUIS) in order to acquire 
current (actual) health data pattern set on site, online and real time to update KB using 
case base reasoning such a way that cloud computing takes care of providing appropriate 
services; reactive care, episodic centric and clinic centric for performance enhancement, 
injury prevention and rehabilitation. Thus, KB interfaced with smart health algorithms 
processed using cloud computing facilitates the classi?cation of current health status 
considered as actual health status while cloud storage maintains transient health status 
of each individual using historic pattern set already available in cloud storage. Based on 
the limited storage available in worn IoTs, Samsung Gear S3 watch provides 2 GB free 
space, transient health status (classi?cation) is stored in a queue to continuously update 
the classi?cation of individual using actual health status on site, online and real time. 
2 Smart Health Solution Architecture 
2.1 Rationale 
On health or lifestyle monitoring, harvesting of motion data and context reasoning is 
often a complex task. IntelliHealth Solutions was introduced to assess, monitor and to 
provide feedback on active lifestyle focusing generalized solution for normal Brunei 
citizens [4]. While IntelliHealth solutions has already achieved the establishment of 
reference standards of Brunei Citizens based on soldiers and national athletes (healthy 
citizens) using intelligent knowledge base formed (resident pattern storage in a cloud 
server) [5], the aim of this research is to develop a transient wearable healthcare solutions 
for transient pattern storage in real time with shared resource allocation using cloud 
technology for resident pattern storage already formed using intelligent knowledge base. 
This will allow real time monitoring of human test subject while performing real time 
walking, jogging, running and cycling. So far, resident pattern storage of soldiers and 
athletes has been established using smart data and decision fusion consisted of smart 
data analytics, deep learning, case based reasoning and virtual measurement and instru- 
mentation technologies [6]. Thus, the achievement of the development of wearable 
motion interfacing and reasoning devices for general public with its own vision ‘towards 
active healthy lifestyle’ facilitates the monitoring of gait and rehabilitation of initially 
ASEAN obese community with pilot study on going in Brunei as the center, Malaysia 
and Vietnam under the ASEAN Institutes of Virtual Organization at National Informa- 
tion and Communications Technology (NICT), Tokyo, Japan with the title “IoT system 
for Public Health and Safety Monitoring with Ubiquitous Location Tracking”. 
Heavy computations required for motion data reasoning and position estimation 
result in high energy consumption. Together with the needs to maintain a reliable data 
connection anytime anywhere, a practical battery design is becoming a huge challenge 
for such wearable devices. Certain computations need to be o?oaded to a cloud server 
600 S.M.N. Arosha Senanayake et al.
without signi?cantly compromising the response time. In today’s highly digitized 
society, cloud technologies play a critical role in preserving health and safety of citizen 
especially women, children and the elderly. Over the last few years, there is a growing 
needs for monitoring the citizen’s lifestyle including their health status. 
Smart Health will have a direct impact on society leading to a smart society. The 
ultimate achievement of AoT for smart health solutions works as a service provider for 
the wellbeing of public. The AoT for quality life style have not been addressed exten- 
sively in recent years. Recently developed devices were not a great success due to three 
main critical issues not appropriately integrated into customized devices targeting a 
particular society needs (ASEAN countries); Intelligent User Interfaces, information 
fusion and real time biofeedback control. Hence, the goal of Smart Health solutions is 
to design, implement and build AoT devices/tools which incorporate hybrid tools; intel- 
ligent user interfacing systems and real time biofeedback control systems embedded 
with information fusion. 
Smart Health will have a direct impact on society leading to a smart society. The 
ultimate achievement of AoT for smart health solutions works as a service provider for 
the wellbeing of public. The AoT for quality life style have not been addressed exten- 
sively in recent years. Recently developed devices were not a great success due to three 
main critical issues not appropriately integrated into customized devices targeting a 
particular society needs (ASEAN countries); Intelligent User Interfaces, information 
fusion and real time biofeedback control. Hence, the goal of Smart Health solutions is 
to design, implement and build AoT devices/tools which incorporate hybrid tools; intel- 
ligent user interfacing systems and real time biofeedback control systems embedded 
with information fusion. 
Thus, AoT for smart health solutions embeds solutions for injury prevention, 
performance enhancement and rehabilitation using reactive care services, episodic 
response services and clinic centric services respectively. Intelligent Graphical User 
Interfacing System (IGUIS) was built to integrate these services and tested using soldiers 
and national athletes successfully as reported in [6]. IGUIS was built using virtual 
measurement and instrumentation tools provided by LabVIEW and using Support 
Vector Machines (SVM) interfaced with case base reasoning. 
2.2 System Architecture 
As shown in Fig. 1, the overall system architecture is mainly divided into two sub-systems; 
Wearable Device and Server (Cloud) which are interconnected via communi- 
cation protocols with two critical parameters; one related to IoT(s) active from Array of 
Things (AoT) and the status. 
Initially, wearable device considered is Android based platform, but recon?guring 
to other wearable platforms is allowed using customizing tools integrated. Wearable 
device contains multimodal healthcare system on device, personalized wrist band data 
center and AoT platforms. AoT is designed in order to accommodate all embedded 
platforms arising from multimodal healthcare system from di?erent devices. It is imple- 
mented using real time embedded system interfaced with IGUISs. Hence, AoT uses 
daisy chain methods to interface with all IoT devices encapsulated under smart health 
Array of Things for Smart Health Solutions Injury Prevention 601
solutions. This will allow the connectivity of future IoTs to be developed with no addi- 
tional hardware. In order to facilitate the connectivity with Cloud servers, personalized 
communication protocol is built. 
Communication protocol is the interface to the server usually a cloud server con?g- 
ured to the IoT in consideration. It carries two important information from Android 
device currently active; IoT and Status. IoT information contains personalized health 
protocol headers which allows to recon?gure and to synchronize with corresponding 
smart health data in the cloud server. The status is the result of actual health status of 
actual human test subject in consideration in real time or online. 
Cloud server contains smart health algorithms built in on server, smart health data 
analytics and hybrid system platforms. Thus, cloud server is the service provider which 
provides data visualization using virtual technologies and services requested by the end 
user. 
Hybrid system platforms is based on hybrid system architecture platforms (HSAP) 
interfaced to wearable devices. As far as wearable devices connected are based on 
HSAP, they can transfer necessary smart health data into HSAP for processing. In this 
project, HSAP is restricted to wearable devices with Android platforms and its families 
such as Tizen OS platforms used for smart watches. HSAP is depicted in Fig. 2. 
Fig. 2. Hybrid System Architecture Platforms (HSAP). 
Main components of HSAP are smart data fusion, smart data analytics and deep 
learning. Smart data fusion is carried out using IoT currently active interfaced with actual 
health status of current human test subject under consideration in real time or/and online. 
Thus, this will facilitate to apply selected smart health algorithm in order to transform 
602 S.M.N. Arosha Senanayake et al.
active pattern set for smart data analytics. Smart data analytics is responsible to apply 
case based reasoning for the intelligent knowledge base already stored in cloud server 
such a way that transient health pattern set already available in the memory is the basis 
to retrieve the matching pattern set or/and revise and retain in the knowledge base. Deep 
learning techniques are implemented to produce the output to be either visualized as 
personalized health data or/and client services requested by clinicians or/and physio- 
therapists or/and trainers or/and subject under assessment which are primarily based on 
the established protocols and norms for injury prevention, performance enhancement 
and rehabilitation monitoring. In this research, Canadian protocols have been used to 
implement decision fusion algorithms to make a ?nal judgment as a wireless wearable 
assistive tool/device independent of location and human anthropometry. 
3 Prototypes Built, Emulation and Validation 
The implementation of the AoT for Smart Health Solutions (SHS) is based on the criteria 
and norms (Canadian norms) currently practiced by the Performance Optimization 
Centre of Ministry of Defense and Sports Medicine and Research Centre of Brunei 
utilizing the standard guidelines established for injury prevention, performance 
enhancement and rehabilitation of soldiers and national athletes. Thus, AoT is designed 
by setting up di?erent functional/service units (currently in operation) as follows; reac- 
tive care, episodic response and clinic centric. 
Thus, smart health solutions at its current stage support the following functionalities 
across wearable devices and HSAP. 
• Personalized Wrist Band Data Centre for Healthy Lifestyle 
• Pre-clinical monitoring of movement disorders/abnormalities 
• Secure Personalized Performance Analysis Data Center 
• Personalized Recovery Progress Analysis and Classi?cation 
• Secure Sports/Military Personnel Performance Enhancement. 
A hybrid intelligent framework was developed by combining case-based reasoning 
(CBR) approach and adaptive intelligent mechanisms in order to build prototypes with 
di?erent functionalities. The framework utilizes the concept of solving new problems 
by using/modifying the similar previous experiences (problem-solution pairs). CBR 
problem-solving cycle consists of four steps [7, 12]: 
• Retrieve: Finding similar case(s) from the knowledge base whose problem descrip- 
tion best matches with the given problem. 
• Reuse: Reusing the solution of most similar case to solve the new problem. 
• Revise: Adapting/Modifying the chosen solution according to the di?erences in new 
problem. 
• Retain: Storing the new problem-solution pair as a case once it has been solved. 
Thus, designing intelligent hybrid knowledge based system is subject to the estab- 
lishment of knowledge base (KB) of smart health solutions using pattern sets currently 
Array of Things for Smart Health Solutions Injury Prevention 603
available and at the same time allowing the evolvement of KB with new pattern sets 
subject to CBR which is stored in a cloud server as depicted in Fig. 1. 
3.1 Knowledge Base (KB) 
The structure of knowledge base for smart health solutions is depicted in Fig. 3. The 
knowledge base contains di?erent types of information including; raw and processed 
data, domain knowledge, historical data available for subjects (pre-injury, post-injury 
and recovery data) and session data during convalescence, case library (problem-solu- 
tion pair), reasoning and learning models (trained intelligent methods) and other relevant 
data (e.g. subjects’ pro?les, gender, activity type, etc.). 
Fig. 3. The structure of knowledge base for smart health solutions. 
In order to manage the knowledge base repository, a relational database was used to 
reduce the storage redundancy and provide ?exibility. The knowledge base evolves with 
the time-period when new problems are presented and new cases are added to the system 
604 S.M.N. Arosha Senanayake et al.
using CBR. This evolution process makes it more useful for domains where subject’s 
speci?c monitoring and prognosis mechanisms are required. 
In general, the information in KB can be represented as in (1): 
KB = 
[ 
pre_inj_Ii S 
, post_inj_Ij S 
, post_op_Ik S 
, T 
( 
pre_inj_Ij S 
) 
, 
T 
( 
post_inj_Ij S 
) 
, T 
( 
post_op_Ik S 
) 
, Sp, D, C, Mt 
] 
(1) 
where 
pre_inj_Ii S 
: raw input data set of a group of subjects ‘S’ for di?erent activities at pre-injury 
(i.e. healthy) stage for i sessions (i =o 1) 
post_inj_Ij S 
: raw input data set of a group of subjects ‘S’ for di?erent activities during 
post injury for j sessions (j =o 1) 
post_op_Ik S 
: raw input data set of a group of subjects ‘S’ for di?erent activities during 
post-surgery (i.e. rehabilitation) for k sessions (k =o 1) 
T(pre_inj_Ii S 
): processed input data set of a group of subjects ‘S’ for di?erent activ- 
ities at pre-injury (i.e. healthy) stage for i sessions (i =o 1) 
T(post_inj_Ij S 
): processed input data set of a group of subjects ‘S’ for di?erent activ- 
ities during post-injury (i.e. before surgery) for j sessions (j =o 1) 
T(post_op_Ik S 
): processed input data set of a group of subjects ‘S’ for di?erent activ- 
ities during post-surgery (i.e. rehabilitation) for k sessions (k =o 1) 
Sp: pro?le (e.g. gender, age, weight, height, type of injuries, activities etc.) of p 
subjects 
D: domain knowledge (e.g. type of protocols followed for subjects, local/standard 
norms for di?erent rehabilitation testing activities etc.) 
C: case library consisting of problem-solution pairs (processed input, rehabilitation 
procedure followed, outcomes and possible suggestions) related to individuals or 
di?erent group of subjects 
Mt: trained intelligent models for each activity t to be monitored. 
The designed KB is not a static collection of information, but it acts as a dynamic 
resource which has the capacity to learn and evolve with the passage of time when new 
problems are presented and new problem-solution pairs are added to the system using 
CBR. This evolution process makes it more useful for domains where subject’s speci?c 
monitoring and prognosis mechanisms are required. Thus, as an integral component of 
injury prevention, performance enhancement and rehabilitation, this KB has been used 
to optimize collection, organization and retrieval of relevant information for subjects 
using CBR. 
3.2 Smart Health Solutions Service Provider 
Services de?ned by smart health solutions are tightly coupled with available AoT func- 
tional/service units and its functionalities across wearable devices and HSAP with the 
hybrid intelligent knowledge based system formed as explained in the Sect. 3.1. Hence, 
prototypes built, emulation and validation are carried out using reactive care, episodic 
Array of Things for Smart Health Solutions Injury Prevention 605
response and clinic centric under the careful supervision of specialists; clinicians, phys- 
iotherapists, trainers, test subjects, etc. 
Reactive Care. This service provides performance enhancement and injury prevention 
tools as proactive and preventive care services for healthy active lifestyle. If a person is 
concerned about daily active lifestyle, reactive care services produce required output 
data using daily healthcare records up to date using easy steps as follows: 
• Secure Personalized data center is responsible to store and to visualize all measure- 
ments of daily active lifestyle. 
• If a person is not active during working time, preventive care tool assists to ?nd and 
to determine causes. 
• Produce and generate personalized reports using data visualization tools. 
Episodic Response. These tools guarantee life long active daily life style by providing 
periodic monitoring and biofeedback control through appropriate intervention during 
critical stages. Episodic response tools provide services not only for today, it is about 
wellbeing throughout the life. Periodic monitoring of recovery stages upon the injury 
treatment will lead the returning to healthy active lifestyle within shortest possible time 
frame. These features are integrated using the following tools: 
• Pre-clinical monitoring of movement disorders/abnormalities. 
• Personalized Recovery Progress Analysis and Classi?cation by storing personalized 
data into a knowledge base in which pre-injury, post-injury and recovery data are 
stored and fused in the cloud server. 
• Real time biofeedback control using personalized wearable devices. 
Clinic Centric. Clinic centric service guides patients with rehabilitation protocols for 
recovery of injured joints/muscles or/and tiny muscle repair. The injury recovery is 
crucial to return to active daily healthy lifestyle. Progressive recovery percentage can 
be quanti?ed and visualized using following tools: 
• Secure personalized wrist band data center using wearable wireless sensor suit. 
• Integrated tiny muscle detector of damaged tiny muscle areas in relevant muscles up 
to mm2 . 
• Produce and generate personalized reports using virtual technologies interfaced with 
data visualization tools. 
In this research, prototypes built, emulation and validation of smart health solution 
services have been proven and tested using the following key and critical planned activ- 
ities: 
• Prototypes built for physical & mobility impairments, obesity, gait disorders, etc. 
• Incorporated intelligent user interfacing tools and real time biofeedback mechanisms 
in wearable devices (smart watches) and customized taking into consideration society 
needs. 
• Validate and test smart health solution service for di?erent types of human test 
subjects (ASEAN, Japan and USA) in di?erent clinical environment; Performance 
Optimization Centre and Sports Medicine and Research Center in Brunei. 
606 S.M.N. Arosha Senanayake et al.
4 Case Studies Using AoT Built 
AoT is built using virtual measurement and instrumentation technologies (LabVIEW), 
Tizen OS emulator and smart watches for physical and mobility impairments, obesity 
and gait disorders community and for national athletes as healthy subjects in a society. 
In order to validate and test AoT so far built, clinical and laboratory environment were 
set up as illustrated in Fig. 4 at Performance Optimization Centre of Ministry of Defense, 
Sports Medicine and Research Centre of Ministry of Youth, Culture and Sports and 
Physiotherapy unit under Ministry of Health. 
Fig. 4. Clinical and laboratory set up for smart health solutions. 
4.1 Case Study 1 – Injury Prevention and Rehabilitation 
A general framework of intelligent and interactive biofeedback virtual measurement and 
instrumentation system was built for physical and mobility impairments, obesity and 
gait disorders as smart health solution for soldiers and professional athletes, especially 
during rehabilitation monitoring. The application of machine learning techniques along 
with custom built wireless wearable sensor suit facilitated in building a knowledge base 
system for periodical rehabilitation monitoring of test subjects and providing a visual/ 
numeric biofeedback to the clinicians, patients and healthcare professionals. The vali- 
dated system is currently used as a decision supporting tool by the clinicians, physio- 
therapists, physiatrists and sports trainers for quantitative rehabilitation analysis of the 
subjects in conjunction with the existing recovery monitoring systems [5]. 
In order to perform real time recovery classi?cation of gait pattern for an ambulation 
activity, multi-class Support Vector Machine (SVM) is implemented using one – vs – 
all method. SVM has been extensively used as a machine learning technique for many 
biomedical signal classi?cation applications. The identi?cation of class/status from gait 
patterns of a new/actual subject can provide useful complementary information in order 
to make the adjustments in his/her rehabilitation process. Figure 5 illustrates LabVIEW 
Array of Things for Smart Health Solutions Injury Prevention 607
data ?ow diagram of SVM embedded into the Intelligent Graphical User Interfacing 
System (IGUIS) built [6]. 
Fig. 5. Data ?ow diagram of SVM for recovery classi?cation. 
Thus, interactive biofeedback visualization was designed to monitor rehabilitation 
and recovery status of subjects with physical and mobility impairments, obesity and gait 
disorders. There are two conditions accepted by biofeedback visualization. First condi- 
tion is the availability of gait pattern set of the subject in the KB (o?ine) while the 
second condition is the subject undergoing actual experiment to analyze current recovery 
status (real time). In o?ine mode, biofeedback visualization displays previously saved 
and visualized signals using IGUIS. The total time needed for real time system software 
to start until the output produced is 20 s during real time analysis, otherwise in o?ine 
processing, it is immediate. The visual output generated using IGUIS facilitates the 
adjusting of individual subject’s rehabilitation protocol using standard procedures 
governed. 
Di?erent classi?ers may assign di?erent classes to the same subject base on his/her 
performance during each activity or due to misclassi?cation. In addition to evaluate the 
output of an individual activity of a subject, an overall assessment can also be helpful 
to categorize the recovery stage of a subject after a certain rehabilitation period. The 
classi?cation results of multiple activities for each subject’s data have been combined 
using Choquet integral method as illustrated in (2). The Choquet integral is a non-linear 
functional de?ned with respect to a fuzzy measure g?, where g? is completely determined 
by its densities (gi - degree of importance of classi?er yi towards ?nal decision). The 
fusion of di?erent classi?ers is computed based on (1) and (2) [8, 13]. 
608 S.M.N. Arosha Senanayake et al.
ek 
= 
t 
?k 
i=1 
(hk(yi) 
-i 
hk(yi-1)).g(Si) (2) 
Where 
hk(yi): The certainty of the identi?cation of subject S to be in stage k using the 
classi?er yi 
g(Si): The degree of importance of classi?er yi of the subject S towards ?nal decision 
ek: The overall recovery stage of the fuzzy integration based on the highest value 
computed for e in the stage k of subject S. 
Figure 6 shows classes of recovery classi?cation of a knee injured test subject 
extracted from IGUIS built. Four classes (A, B, C and D) were formed using historical 
data collected and stored in the KB using fuzzy C-means clustering. Hence, classes A 
through D represent di?erent stages of health/recovery condition of subjects based on 
the gait patterns; Class A: represents 2–6 months of recovery, Class B: represents 7–12 
months of recovery Class C: represents 13–24 months of recovery; Class D: represents 
healthy subject. 
Fig. 6. Knee recovery classi?cation of the subject classi?ed as Class A in real time. 
Having implemented hybrid intelligent framework together with CBR called as 
smart health algorithms stored in cloud server for clinic centric/episodic response care 
services and data visualization can be obtained using wearable IoT devices. In this study, 
Tizen OS visualization emulator was used as illustrated in Fig. 7 and subsequently 
visualized using Samsung Gear S3 smart watch as IoT device, courtesy from Samsung 
Asia Pte Ltd, Singapore as illustrated in Figs. 8 and 9 using JSON tools. 
Array of Things for Smart Health Solutions Injury Prevention 609
Fig. 7. Tizen OS emulator for real time classi?cation during rehabilitation. 
Fig. 8. IoT devices for real time classi?cation during knee rehabilitation. 
Fig. 9. Samsung Gear S3 smart watch for real time classi?cation during knee rehabilitation. 
In this study, Samsung Gear S3 smart watch works as an IoT for injury prevention 
and rehabilitation tool wirelessly connected independent of clinicians and patients 
(soldiers) locations. As far as IoT (Samsung Gear S3 smart watch) tool revises the pattern 
set using actual (current) pattern set identi?ed during the rehabilitation process, case 
based reasoning is used to update the intelligent KB in the cloud server. Hence, clinicians 
610 S.M.N. Arosha Senanayake et al.
were able provide real time biofeedback for patients so that soldiers under monitoring 
due to rehabilitation and injury prevention followed the protocols given clinicians in 
order to improve the recovery classi?cation. As per IoT so far built for the critical joint 
of soldiers’ knee, clinicians were able to prevent doing second Anterior Cruciate Liga- 
ment (ACL) surgery for women soldiers who were commonly prone not returning 
soldiers’ career due to no real time biofeedback monitoring done previously. Therefore, 
Samsung Gear S3 smart watch as the IoT for real time knee monitoring used in this study 
was capable to provide the current recovery classi?cation of knee injured soldier without 
physical presence in the clinic and at the same time based on the current classi?cation, 
clinicians were able to provide new protocols to improve knee rehabilitation process. 
Currently, this IoT is used for soldiers as soldiers are considered as reference/bench- 
marking population in a nation. Since this study has already proven the capability of 
real time biofeedback monitoring using IoT via smart health data stored and accessed 
via cloud server set up, current study is in focus on validating and testing normal public 
in physiotherapy clinic in the government hospital and Jerudong Park Medical Center 
under Gleneagles Hospital chain from Singapore under the close routine monitoring of 
clinicians in the clinic. While patients are voluntarily taking part in this pilot study, smart 
watches are sponsored by Samsung Asia Pte Ltd to revise the pattern set based on the 
pattern set collected at home environment by automatically updating smart health data 
in the cloud server. 
4.2 Case Study 2 – Performance Enhancement 
A hybrid framework combining Self Organizing Maps (SOMs) and CBR for clustering, 
accessing, examining and recommending training procedures for performance enhance- 
ment of national athletes is implemented. This system is intended to assist sports profes- 
sionals, coaches or clinicians to maintain records of subject information, experiment 
information, diagnose improper movements based on KB, provide recommendation for 
improvements and monitor progress of performance over a period of time. 
The IGUIS is built to facilitate monitoring and providing instantaneous biofeedback 
during training sessions. The IGUIS supports a range of features necessary in real time 
applications, and are clustered into separate frames for simplicity and ease of use. 
Figure 10 illustrates IoT platforms used for real time data visualization during 
performance enhancement of national athletes based on the hybrid framework 
combining SOMs and CBR implemented as smart health algorithms in the cloud server 
in order to derive personalized performance enhancement of athletes using reactive care 
and episodic response services provided by the smart health solutions. In this study, 
Tizen OS emulator followed by Samsung Gear S3 smart watch was used to visualize 
data applying database driven neural computing interfaced with JSON tools as illustrated 
in Fig. 11. 
Array of Things for Smart Health Solutions Injury Prevention 611
Fig. 10. IoT platforms for athletes’ performance enhancement using hybrid intelligent 
computing. 
Fig. 11. Samsung Gear S3 smart watch for athletes’ performance enhancement in real time. 
In this study, database-driven neural computing system was used to monitor di?erent 
activities instructed by coaches during their training regime. Di?erent coaches use 
di?erent protocols and standards to classify national athletes. But, in general the expect- 
ation is to perform as excellent or very good for di?erent activities assigned each athlete 
during training regime, otherwise automatically considered as not deserve to be in the 
national pool of athletes. Hence, women netball players in a training regime were 
considered under the close monitoring of coaches and physical strength and conditioning 
specialist who use Canadian protocols. There are pre-de?ned activities set by the coach 
during training regime in order for coach to decide the positioning of players in forth- 
coming international games/tournament. By wearing smart watch during training regime 
in the indoor stadium and pre-de?ned physical exercises given by coaches and clinicians, 
just before the subsequent training regime, coach and clinicians have the access to pro?le 
pattern set of each athlete updated in the cloud server. Samsung Gear S3 smart watch 
considered as an IoT worn by each athlete automatically visualizes transient health status 
of personalized classi?cation from cloud storage prior to actual regime starts which is 
fundamental for healthcare professionals, in this case coaches to determine the perform- 
ance level of athlete to be undergone in the actual training regime onsite, online and real 
time. Hence, coaches and clinicians are able to make a judgment or/and re-adjust the 
612 S.M.N. Arosha Senanayake et al.
training regime of each athlete with updated/revised protocols for the forthcoming 
training regimes and actual games based on real time biofeedback monitoring. 
5 Comparative Analysis with Existing Systems 
Array of Things (AoT) using virtual measurement and instrumentation technologies for 
smart health solutions addressed in the research work is novel. While there are speci?c 
application domains exist using augmented, virtual and mixed realties, none of the 
existing applications failed to introduce generalized architecture similar to Hybrid 
System Architecture Platform (HSAP) which allows the interfacing and mapping to 
speci?c domain of interest using cloud computing. Further, this article addresses the 
solution space using wearable technologies from the acquisition of pattern set of person- 
alized health pattern set via multimodal healthcare system using personalized wrist band 
data center while IoT themselves, in this case Samsung Gear S3 smart watch works as 
real time biofeedback monitoring based on transient health status (classi?cation) and 
current/actual health status (classi?cation or recovery status) onsite, online and real time 
during injury prevention, performance enhancement and rehabilitation using cloud 
computing. Hence, there is no concrete evidence in literature to do comparative analysis 
because so far solutions provided are domain centric within digital healthcare technol- 
ogies and services. 
6 Conclusions 
Array of Things (AoT) for smart health solutions during injury prevention, performance 
enhancement and rehabilitation futuristic concept introduced in this research work were 
proven by interfacing virtual measurement and instrumentation (LabVIEW from NI) 
and IoT platforms (Samsung Gear S3 smart watch). Intelligent graphical user interfacing 
system was built to assist the formation of intelligent knowledge base which is an 
evolving smart health pattern storage using case base reasoning via retrieve, reuse, revise 
and retain mechanisms during real time biofeedback monitoring. At its current stage, 
cloud storage consists of smart health data processed according to Canadian standard 
protocols established by coaches, clinicians, physiotherapists and physical strength 
conditioning specialists at Performance Optimization Center of Ministry of Defense and 
at Sports Medicine and Research Center of Ministry of Youth, Culture and Sports using 
nation active healthy population; soldiers and professional athletes. Two case studies 
have been conducted during their training regimes under close monitoring of di?erent 
specialists. AoT for smart health solutions concept was proven using IoT platforms 
during real time feedback monitoring and at the same time reference and benchmarking 
were able to set up based on the nation active healthy population; soldiers and athletes. 
This will allow to establish norms for general public for their health and safety moni- 
toring during their real time biofeedback monitoring using these IoT platforms as assis- 
tive tools/devices for di?erent health classi?cation and recovery status regardless of 
patients location whether at home or/and at clinic under close monitoring of di?erent 
specialists. 
Array of Things for Smart Health Solutions Injury Prevention 613
Thus, services provided by AoT; reactive care, clinic centric and episodic response 
provide the platform to personalize IoT devices for healthcare using database driven 
neural computing platforms. 
Therefore, futuristic goal of this ongoing research will be the utilization di?erent 
deep learning algorithms, in particular reinforcement learning mechanisms for smart 
data analytics which will be geared for smart data visualization and services. 
Acknowledgments. This publication is part of the output of the ASEAN Institutes of Virtual 
Organization at National Information and Communications Technology (NICT), Tokyo, Japan; 
ASEAN IVO project with the title “IoT system for Public Health and Safety Monitoring with 
Ubiquitous Location Tracking”. This research is also partially funded by the University Research 
Council (URC) grant scheme of Universiti Brunei Darussalam under the grant No: UBD/PNC2/2/ 
RG/1(195). 
References 
1. Michael, E.P.: Introduction to the array of things. http://niu.edu/azad/_pdf/3- 
Michael_May18_2016.pdf 
2. Alberto, J.E., et al.: Technology in parkinson’s disease: challenges and opportunities. Mov. 
Disord. 31(9), 1272–1282 (2016). https://doi.org/10.1002/mds.26642. Epub 29 April 2016 
3. Vieira, A., Ribeiro, B., Ferreira, J.P.: GAIT analysis: methods & data review. Cisuc tecnhical 
report TR-2017–004, December 2017 (unpublished) 
4. Arosha Senanayake, S.M.N., et al.: IntelliHealth solutions: technology licensing. http:// 
intelli-health.org/ 
5. Yahya, U., Arosha Senanayake, S.M.N., Naim, A.G.: Intelligent integrated wearable sensing 
mechanism for vertical jump height prediction in female netball players. In: Eleventh 
International Conference on Sensing Technology (ICST), Sydney, Australia, pp. 94–100. 
https://doi.org/10.1109/icsenst.2017.8304484. 978-1-5090-6526-4/17/$31.00 ©2017 Crown 
6. Filzah Pg Damit, D.N., Arosha Senanayake, S.M.N., O., Malik, Jaidi Pg Tuah, P.H.N.: 
Instrumented measurement analysis system for soldiers’ load carriage movement using 3-D 
kinematics and spatio-temporal features. Measurement 95, 230–238 (2017) 
7. Wulandari, P., Arosha Senanayake, S.M.N., Malik, O.A.: A real-time intelligent biofeedback 
gait patterns analysis system for knee injured subjects. In: Nguyen, N.T., et al. (eds.) 
Intelligent Information and Database Systems, Part II. Lecture Notes in Arti?cial Intelligence 
(LNAI), vol. 9622, pp. 703–712. Springer, Heidelberg (2016). https://doi.org/ 
10.1007/978-3-662-49390-8_68 
8. Arosha Senanayake, S.M.N., Malik, O.A., Iskandar, P.M., Zaheer, D.: A knowledge-based 
intelligent framework for anterior cruciate ligament rehabilitation monitoring. J. Appl. Soft 
Comput. 20, 127–141 (2014) 
9. Senanayake, C., Arosha Senanayake, S.M.N.: A computational method for reliable gait event 
detection and abnormality detection for feedback in rehabilitation. Comput. Methods 
Biomech. Biomed. Eng. 14(10), 863–874 (2011) 
10. Alahakone, A.U., Senanayake, A.: A real-time interactive biofeedback system for sports 
training and rehabilitation. Proc. IMechE J. Sports Eng. Technol. 224(Part P), 181–190 (2010) 
11. Gouwanda, D., Arosha Senanayake, S.M.N.: Emerging trends of body-mounted sensors in 
sports and human gait analysis. In: International Federation for Medical and Biological 
Engineering Book series, Chap. 102. Springer, Heidelberg (2008). ISBN 978-3-540-69138-9 
614 S.M.N. Arosha Senanayake et al.
12. Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations, 
and system approaches. AI Commun. 7, 39–59 (1994) 
13. Murofushi, T., Sugeno, M.: An interpretation of fuzzy measures and the choquet integral as 
an integral with respect to a fuzzy measure. Fuzzy Sets Syst. 29, 201–227 (1989) 
Array of Things for Smart Health Solutions Injury Prevention 615
Applying Waterjet Technology 
in Surgical Procedures 
George Abdou(&) and Nadi Atalla 
New Jersey Institute of Technology, Newark, USA 
{abdou,na76}@njit.edu 
Abstract. The main objective of the paper is to predict the optimal waterjet 
pressure required to cut, drill or debride the skin layers without causing any 
damages to the organs. A relationship between the waterjet pressure and skin 
thickness has been established. It also includes the modulus of elasticity of the 
skin, the diameter of nozzle ori?ce, the nozzle standoff distance and the traverse 
speed of the waterjet as well as the duration of applying the waterjet pressure. 
Thus, practical relationship between waterjet operating parameters and the 
physical properties of the skin has been formulated. A real Caesarean section 
procedure data has been applied to the formulation. Given the Ultimate Tensile 
Strength of the skin at the abdomen to be 20 MPa, incision parameters of 
18 mm deep, 12 cm long and 0.4 mm wide, applying a traverse speed of 
0.5 mm/s and stand-off distance of 5 mm, the resulted waterjet pressure is 
17.89 MPa using a 0.4 mm ori?ce diameter. 
Keywords: WaterjetSurgerySkinIncision 
1 Introduction 
Waterjet technology has been used in several applications such as industrial cutting, 
drilling and cleaning. Furthermore, waterjet technology can also be used in the medical 
?eld; applications of this include dentistry, wound cleaning and other surgical opera-tions. 
Over the years, waterjet techniques have been developed into a revolutionary 
cutting tool in variety types of surgery [1]. It can be used in precision cutting of skin for 
any type of surgery. The tool would simply be moved in a line to apply the pressure 
and the cut. The main advantage of waterjet incision is its precision; it is as effective as 
a laser cutter. However, the waterjet incision does not cause any thermal damage to the 
separated tissue due to its coolant ability. Additionally, the waterjet also washes away 
blood which eliminates any extra tools to do this which would be required in a regular 
cut [2]. 
In vivo and in vitro experiments on patients and animals have been conducted with 
continuous waterjet at different low pressures. However, few studies have focused on 
the skin. Further analyses on the relationship among the operating parameters of 
waterjet, structure, and mechanical properties of the skin must be conducted. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 616–625, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_46
2 Literature Review 
The waterjet technology is currently used for cutting a wide range of materials. The 
main advantages of this technology include the lack of thermal effect on the material 
being cut. While waterjet is applied to all kinds of industries, only the medical ?eld will 
be highlighted. Table 1 summarizes some of the applications of waterjet cutting in the 
medical ?eld. 
The performance of waterjet machining process is dependent on the water pressure 
of the jet and the elastic properties of the skin. The initial impact is considered to be the 
highest impact; it can be achieved when the waterjet hits the tissue. After that, the water 
starts flowing radially and the impact of the jet decreases [4]. 
2.1 Waterjet in Surgical Wound Debridement 
Waterjet technology can be used for surgical wound debridement and surgical inter-ventions 
where selective cutting is necessary. Surgical wound debridement uses 
devices on the market such as VersaJet and Debritom while surgical interventions use 
devices on the market such as Jet Cutter 4, Helix HydroJet and ErbeJet2 [4]. 
A study in 2006 introduced Versajet waterjet as an alternative to standard surgical 
excisional techniques for burn wounds. In the study, the Versajet waterjet was able to 
suf?ciently debride super?cial partial thickness and mid-dermal partial thickness 
wounds for the subsequent placement of Biobrane. Additionally, the study has 
demonstrated that the Versajet waterjet has the advantage in the surgical treatment of 
super?cial to mid-partial thickness burns in the face, hand and foot [5]. 
Table 1. Overview of using waterjet in medicine [3] 
Type of 
surgery 
Operation description Bene?ts 
Orthopedic Cutting endoprosthesis and bone Below the critical temperature by 
cutting 
Dental Cutting and grinding of dental 
materials 
Reduces the risk of jagged teeth and 
reduces the need for anesthesia 
General Resection of soft tissues: liver, gall 
bladder, brain, kidney, prostate, 
cleaning wounds 
Blood vessels and nerve ?bers remain 
in the de?ned pressure maintained, 
minimal bleeding, intact edges and 
precise cuts, lack of necrotic edge, 
reduce the duration of myocardial 
ischemia 
Plastic Cleaning skin graft, removal of 
tattoos, liposuction 
Separation of the layers of tissue, 
higher accuracy of results without 
edema and contour changes 
Dermatology Removing dead skin Possibility of direct dose medications 
in a water jet 
Applying Waterjet Technology in Surgical Procedures 617
Another study conducted in 2007 reviewed the versatility of the Versajet waterjet 
surgical tool in treating the deep and indeterminate depth face and neck burns. With ex-vivo 
histologic analysis of depth of debridement on human skin, the study con?rmed 
that predictable and controlled depth of debridement could be obtained by adjusting the 
apparatus settings [6]. 
2.2 The Use of Waterjet Incision in Other Surgical Procedures 
Waterjet technology in surgical procedures was ?rst reported in 1982 for liver resec-tion. 
Throughout the years, waterjet machining process has become a recognized 
technique in different surgical areas. Clinically, waterjet technique is used for cutting 
softs tissues like liver tissues. Experimentally, waterjet technique is used for dissecting 
spleen, kidney tissue and brain tissues. While these tissues can be cut at low water 
pressures, waterjet techniques can also cut bone and bone cement at much higher water 
pressures [7]. 
Studies have been done using waterjet technology to drill or cut bone or bone 
cement. A study in 2014 has shown that such cut requires water pressure that ranges 
between 30 MPa to 50 MPa; which depends on the diameter of the nozzle. The study 
also summarized different materials that were tested in previous analyses, the required 
waterjet pressure to cut them as well as the nozzle diameter (Table 2). 
A comparison between the existing systems and the proposed algorithm is illus-trated 
in Table 3. 
The methods proposed in this study will provide more flexible and robust solutions 
for setting up the waterjet apparatus when used in surgical procedures. 
3 Mathematical Formulation 
The operating parameters of the waterjet machining process are determined several 
independent variables. Table 4 summarizes these variables based on four system 
components: Process, skin, nozzle and pump characteristics [8]. 
Figure 1 describes how each parameter can control the incision characteristics as 
well as the illustration of the incision processes. 
Table 2. Overview of required waterjet pressures to cut bone and bone cement [7] 
Material tested Dnozzle (mm) Required pressure (MPa) 
Human calcanei 0.6 30 
Human femora 0.3 40 
Bone cement 40 
Human femora 0.2 50 
Bone cement 30 
Human interface tissue 0.2 12 
0.6 10 
618 G. Abdou and N. Atalla
Table 3. Features of previous works and proposed methods 
Authors Year Type of study Method used Apparatus Water 
purity 
Pressure Depth of 
incision 
Width of 
incision 
Cutting 
velocity 
Ori?ce 
diameter 
Stand-off 
distance 
Angle Feed 
rate/transverse 
speed 
Arif [8] 1997 Skin incision Finite element analysis Theoretical 100% 
Water 
Fixed Generated Generated N/A Fixed N/A N/A N/A 
Vichyavichien 
[9] 
1999 Skin incision Finite element analysis Theoretical 100% 
Water 
Fixed Generated Generated N/A Fixed Fixed Fixed N/A 
Wanner et al. 
[10] 
2002 Fat tissue 
incision 
Ex vivo Commercial 0.9% 
saline 
Fixed Generated N/A Fixed Fixed Fixed Fixed N/A 
Rennekampff 
et al. [5] 
2006 Debridement of 
burn wounds 
Ex vivo Commercial Sterile 
saline 
Fixed N/A N/A Fixed Fixed N/A Fixed N/A 
Cubison et al. 
[11] 
2006 Debridement of 
burns 
Ex vivo Commercial N/A Fixed N/A N/A Fixed Fixed N/A N/A N/A 
Tenenhaus 
et al. [6] 
2007 Wound 
debridement 
Ex vivo Commercial N/A Fixed N/A N/A Fixed Fixed N/A N/A N/A 
Keiner et al. 
[12] 
2010 Brain tissue 
dissection 
In vivo Commercial 0.9% 
Saline 
Fixed N/A N/A N/A Fixed N/A N/A N/A 
Kraaij et al. [7] 2015 Interface tissue 
incision 
In vitro Custom 100% 
Water 
Fixed Generated N/A Fixed Fixed Fixed Fixed Fixed 
Bahls et al. [4] 2017 Various tissue 
incision or 
abrasion and 
removal 
In vivo Commercial 10% 
Gelatin 
Fixed N/A N/A Fixed Fixed Fixed Fixed N/A 
Proposed 2018 Skin incision Mathematical/Simulation Matlab & 
Minitab 
100% 
Water 
Generated Variable Variable Generated Generated Variable Fixed Variable 
Applying Waterjet Technology in Surgical Procedures 619
3.1 Surgical Incisions Main Components: Operation Characteristics 
The main three components for a surgical incision are: the width of incision, the length 
of incision and the depth of incision. Before performing the incision, the surgical team 
must have these three factors de?ned. The width of incision as well as the length of 
incision is determined based on the individual surgery and the recommended incision 
speci?cations. When performing a skin incision, the depth of incision is determined by 
the skin thickness. Epidermal thickness differs by age, sex, gender, skin type, pig-mentation, 
blood content, smoking habits, body site geographical location and many 
other variables. For these reasons, a system which can adapt to the differences must be 
created. 
Table 4. Waterjet incision parameters 
Process characteristics Skin characteristics Nozzle characteristics Pump characteristics 
Depth of cut Thickness Stand-off distance Pressure ratio 
Width of cut Hardness Ori?ce diameter Flow rate 
Traverse (feed) rate Consistency Nozzle structure Pump ef?ciency 
Waterjet flow rate Power 
Fig. 1. Waterjet parameters and its components. 
620 G. Abdou and N. Atalla
To develop metrics for skin thickness, high frequency Ultrasound technology is 
necessary. By applying the ultrasound apparatus on the area to be operated on, skin 
thickness can instantly be measured and fed into the system which determines the water 
pressure required for the skin incision. Other skin characteristics can also be deter-mined 
from the Ultrasound results. Such characteristics include the elastic modulus of 
each of the skin layers as well as their tensile strength. 
The total energy required for the skin incision which is converted to pressure 
energy is formulated as follows: 
PE 
¼ 
UTS Qs 
ð1Þ 
Where UTS is the Ultimate Tensile Strength of the skin, and Qs is the flow rate at 
which the waterjet removes the skin which is calculated as: 
For skin cutting and debridement: 
Qs cut 
¼ 
DsLsf 
ð2Þ 
For skin drilling: 
Qs drill 
¼ 
Dswsvs 
ð2aÞ 
Dsis the depth of incision, Lsis the length of incision, f is the traverse speed (feed rate), 
ws is the width of cut and vs is the velocity of the waterjet stream at the skin. 
3.2 Waterjet Operating Conditions: Catcher Characteristics 
To minimize the process noise, a catcher is necessary. The kinetic energy of the catcher 
is the remaining energy that is not absorbed by the skin incision process, it is for-mulated 
as follows: 
KEc 
¼ 
1 
2 
Qcv2 c 
qw 
ð3Þ 
Where qwis the density of water. Qcis the flow rate at which the residue water is going 
into the catcher; it is the sum of the flow rates of water out of the nozzle Qn and rate at 
which the waterjet removes the skin Qs. 
The velocity at which the excess water is going to the catcher (vc) is: 
vc 
¼ 
??????r      
2gx 
?g p 
ð4Þ 
where, g is the gravity. 
3.3 Waterjet Operating Conditions: Nozzle Characteristics 
The kinetic energy of the waterjet stream coming out of the nozzle is the sum of the 
pressure energy required the skin incision and the kinetic energy of the catcher: 
Applying Waterjet Technology in Surgical Procedures 621
KEn 
¼ 
PE 
þ KEc 
ð5Þ 
To look at the nozzle characteristic of the waterjet incision, this kinetic energy (5) 
will be equal to the following: 
KEn 
¼ 
1 
2 
Qnv2 nqwke 
ð6Þ 
Where vn is the velocity of the waterjet stream coming out of the nozzle, ke is the loss 
coef?cient. 
The waterjet nozzle converts high pressure water to a high velocity jet. The per-formance 
of waterjet incision is affected by several variables such as nozzle ori?ce 
diameter, water pressure, incision feed rate and standoff distance. In the medical ?eld, 
waterjet incision devices usually use low to medium pressure as well as a small design 
nozzle that is different from industrial waterjet. A relationship between the velocity of 
the waterjet stream coming out of the nozzle (vn) and the velocity of the waterjet stream 
at the skin (vs) can be described as follows: 
vn 
¼ 
vs 
eax 
ð7Þ 
Where, a is the taper index and x is the standoff distance of the nozzle. Assuming a 
straight taper waterjet nozzle design, the flow of the water from the nozzle to the 
atmosphere is affected by the area and the shape of the ori?ce. Table 5 represents the 
different ori?ce types and the typical values of contraction (Cc) and loss (ke) coef?- 
cients for water ori?ces. 
From (1) through (7) Qn and vn are calculated: 
For cutting and debridement: 
Qn 
¼ 
2 PEcutþ 2gxqwDsLsf 
qw v2 n 
2 
2gx 
g g 
ð8Þ 
vn 
¼ 
2 
????????sf:      
PE 
?????????????cs            
cutþ 
2 
?????????????cs            
gxqwD 
??????????????????.                  
sLsf 
???    
Qnqw 
? 
2gx 
s 
ð9Þ 
Table 5. Types of ori?ces and their coef?cients values [13] 
Ori?ce Description Cc Ke 
SE Sharp-edged 0.63 0.08 
RE Round-edged 1.0 0.10 
TSE Tube with square-edged 1.0 0.51 
TRE Short tube with rounded entrance 0.55 0.15 
622 G. Abdou and N. Atalla
For drilling: 
Qn 
¼ 
2PEdrillþ 2gxqwDswsvs 
qw v2 n 
2 
2gx 
g g 
ð8aÞ 
vn 
¼ 
?????????????cs            
2PEdrill 
?????????????cs            
þ 2gx 
??????????????????.                  
qwDswsvs 
??????????????s              
Qnqw 
g 
2gx 
s 
ð9aÞ 
The relationship between Qn and vn can also be represented by: 
Qn 
¼ 
CcAnvn 
ð10Þ 
An is the area of the ori?ce of the nozzle which is represented by: 
An 
¼ 
p 
d2 
n 
4 
ð11Þ 
Where dn is the ori?ce diameter of the nozzle. 
3.4 Waterjet Operating Conditions: Pump and Intensi?er Characteristics 
The relationship between the velocity of the waterjet flow coming out of the pump 
reservoir and the one coming out of the nozzle is calculated as follows: 
vr 
¼ 
vne2bLn 
ð12Þ 
Where Ln is the length of the nozzle and b is the exponential constant which is based 
on an exponential taper waterjet nozzle design where: 
b 
¼ 
lnðdn=doÞ 
Ln 
ð13Þ 
Where do is the diameter of the top of the nozzle. 
The pressure ratio (rp) between the water outlet pressure (Pw2) and the oil inlet 
pressure (Po1) and as well as the oil inlet area (Ao) and the water inlet area (Aw) is 
described as follows: 
rp 
¼ 
Pw2 
Po1 
¼ 
Ao 
Aw 
ð14Þ 
The waterjet flow rate out of the intensi?er (Qi) is equal to the waterjet flow rate 
coming out of the nozzle (Qn). By design, the hydraulic intensi?er increases the 
pressure of water. Thusly, the water pressure coming out of the intensi?er (Pw2) is 
determined by the Power (W), the ef?ciency of the intensi?er (?i) and the flow rate (Qi) 
as follows: 
Applying Waterjet Technology in Surgical Procedures 623
Pw2 
¼ 
Wgi 
Qi 
ð15Þ 
4 Application Example and Results 
In this example of a caesarean section procedure, Pfannenstiel traverse incision is 
assumed. This curved incision (Length of incision Ls) is approximately 10–15 cm long 
and 2 cm above the pubic symphysis [9]. Using the waterjet, the skin and rectus sheath 
are opened traversely. The rectus muscles are not cut and the fascia is dissected along 
the rectus muscles. The skin thickness at the abdomen for a female is approximately 
2.30 mm while the subcutaneous adipose tissue thickness at the abdomen is approxi-mately 
15.7 mm [10]. The UTS of the skin at the abdomen ranges between 1 and 
24 MPa [11]. The exact thickness of the skin and its characteristics would be measured 
using high frequency Ultrasound. The width of cut is 0.4 mm; in an traditional incision, 
a #10 (0.4 mm) blade is used [12, 13]. Table 6 summarizes the operation character-istics 
as follows: 
The waterjet velocity coming out of the nozzle (vn) is 151.05 m/s while the waterjet 
velocity that reaches the skin (vs) is 150.86 m/s. The velocity of the excess water that is 
going to the catcher is very minimal at 0.31 m/s. The calculated power required for the 
intensi?er is 423.52 W. Assuming the ef?ciency of the intensi?er (?i) is 80%, the 
calculated pressure that is required for the cesarean section operation is 17.89 MPa 
with a 0.4 mm nozzle ori?ce diameter. 
The results obtained from this study can be summarized as follows: 
1. The mathematical formulation for different incision processes has been developed 
and simulated for the best results. 
2. Using the cutting incision, an application example has been demonstrated. 
3. The data applied has been extracted from real life application. 
Table 6. Caesarean section operation characteristics [14–19] 
Parameters Value 
Depth of cut (Ds) 18.00 mm 
Length of cut (Ls) 12.00 cm 
Width of cut (ws) 0.40 mm 
Ultimate Tensile Strength (UTS) 20.00 MPa 
Density of water (q) 1.00 g/cm3 
Feed rate (f) 0.50 mm/s 
Gravity (g) 9.80 m/s2 
Stand-off distance (x) 5.00 mm 
Taper (a) 0.25 
624 G. Abdou and N. Atalla
5 Conclusion and Recommendations 
Given any surgical operation characteristics, this mathematical model is able to cal-culate 
the optimal operating conditions for surgical cutting, debridement or drilling. 
This will help the surgeon pick the right nozzle size as well as the right waterjet 
instrument parameters such as pressure, power and velocity. The next step is to use the 
results of the study to create a comprehensive surgical procedure simulation model 
such as a Caesarean section procedure or any other surgical procedure that is needed. 
References 
1. Areeratchakul, N.: Investigation of water jet based skin surgery (2002) 
2. Yildirim, G.: Using Water jet technology to perform skin surgery (2003) 
3. Hreha, P., Hloch, S., Magurová, D., Valícek, J., Kozak, D., Harnicárová, M., Rakin, M.: 
Water jet technology used in medicine. Tech. Gaz. 17(2), 237–240 (2010) 
4. Bahls, T., et al.: Extending the capability of using a waterjet in surgical interventions by the 
use of robotics. IEEE Trans. Biomed. Eng. 64(2), 284–294 (2017) 
5. Rennekampff, H.-O., Schaller, H.-E., Wisser, D., Tenenhaus, M.: Debridement of burn 
wounds with a water jet surgical tool. Burns 32, 64–69 (2006) 
6. Tenenhaus, M., Bhavsar, D., Rennekampff, H.-O.: Treatment of deep partial thickness and 
indeterminate depth facial burn wounds with water—jet debridement and a biosynthetic 
dressing. Inj. Int. J. Care Inj. 38, 538–544 (2007) 
7. Kraaij, G., et al.: Waterjet cutting of periprosthetic interface tissue in loosened hip 
prostheses: an in vitro feasibility study. Med. Eng. Phys. 37(2), 245–250 (2015) 
8. Arif, S.M.: Finite element analysis of skin injuries by water jet cutting. In: Mechanical and 
Industrial Engineering. New Jersey Institute of Technology, Newark (1997) 
9. Vichyavichien, K.: Interventions of water jet technology on skin surgery (1999) 
10. Wanner, M., Jacob, S., Schwarzl, F., Oberholzer, M., Pierer, G.: Optimizing the parameters 
for hydro-jet dissection in fatty tissue - a morphological ex vivo analysis. Eur. Surg. 34(2), 
137–142 (2002) 
11. Cubison, T.C.S., Pape, S.A., Jeffery, S.L.A.: Dermal preservation using the Versajet® 
hydrosurgery system for debridement of paediatric burns. Burns 32, 714–720 (2006) 
12. Keiner, D., et al.: Water jet dissection in neurosurgery: an update after 208 procedures with 
special reference to surgical technique and complications. Neurosurgery 67(2), 342–354 
(2010) 
13. Abdou, G.: Analysis of velocity control of waterjets for waterjet machining. In: Waterjet 
Cutting West. Society of Manufacturing Engineers, Los Angeles (1989) 
14. Raghavan, R., Arya, P., Arya, P., China, S.: Abdominal incisions and sutures in obstetrics 
and gynaecology. Obstet. Gynaecol. 16, 13–18 (2014) 
15. Akkus, O., Oguz, A., Uzunlulu, M., Kizilgul, M.: Evaluation of skin and subcutaneous 
adipose tissue thickness for optimal insulin injection. Diabetes Metab. 3(8) (2012) 
16. Jansen, L.H., Rottier, P.B.: Some mechanical properties of human abdominal skin measured 
on excised strips. Dermatology 117(2), 65–83 (1958) 
17. Ritter, J.: The Modern-day C-section. Surg. Technol. 159–167 
18. FST Homepage. https://www.?nescience.com/en-US/Products/Scalpels-Blades/Scalpel-
Blades-Handles/Scalpel-Blades-10. Accessed 8 Apr 2018 
19. WardJet Homepage. https://wardjet.com/waterjet/university/precision-quality. Accessed 31 
Mar 2018 
Applying Waterjet Technology in Surgical Procedures 625
Blockchain Revolution in the Healthcare 
Industry 
Sergey Avdoshin(&) and Elena Pesotskaya 
National Research University Higher School of Economics, 
20 Myasnitskaya ulitsa, 101000 Moscow, Russian Federation 
{savdoshin,epesotskaya}@hse.ru 
Abstract. The paper analyses the possibility of using blockchain technologies 
in the sphere of Healthcare. Modern society requires new tools, e.g. distributed 
ledger and smart contracts for sharing data between patients, doctors and 
healthcare professionals by giving them control over the data and allowing 
smarter cooperation. In this situation, utilizing blockchain technology can 
resolve integrity, data privacy, security and fraud issues, increase patient health 
autonomy and provide access to better services. This paper provides a review of 
blockchain technology and research of possible applications in healthcare, gives 
an overview of positive trends and outputs. 
Keywords: Blockchain
.n
Distributed ledger
.n
Smart contracts 
Healthcare
.n
Patient
.n
Security 
1 Introduction 
Blockchain is already disrupting many industries. Initially it was intended as a banking 
platform for digital currency, but now blockchain has applications that go beyond 
?nancial transactions and its operations are becoming popular in many industries. 
The idea of blockchain is to use a decentralized system that can replace banks and 
other trusted third parties. Blockchain is a large structured database distributed by 
independent participants of the system. This database stores an ever-growing list of 
records in order (blocks). Each block contains a timestamp and a reference to the 
previous block. The block cannot be changed spontaneously - each member of the 
network can see that a transaction has taken place in the blockchain, and it is possible 
to perform a transaction only with access rights (private key). Blocks are not stored on 
a single server, this distributed ledger is replicated on thousands of computers 
worldwide, so users interacting in the blockchain do not have any intermediaries. 
Blockchain technology can be shared by individuals, organizations, and even devices. 
It saves time, increases transparency, and gives the ability to make everything a 
tradable asset. The World Economic Forum predicts that by 2027, it would be possible 
to store nearly 10% of the global gross domestic product on blockchains [1]. 
The potential of blockchain has already been realized by many people - authors 
who want to protect their research and share the knowledge at the same time, by car 
owners who want to share their car or use rental cars with no 3rd parties’ commission. 
Even for people who want to share music or even space on their hard drive, but want to 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 626–639, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_47
feel secure and protected at the same time with no involvement of counterparties. Many 
industries are thinking about the great potential and possibilities of blockchain tech-nology, 
and the strong positive effect it can have on people’s health and the healthcare 
system. The cost of medicine in the world is constantly growing. According to the 
Global Health Care report, world health spending in the world’s major regions will 
increase from 2.4% to 7.5% between 2015 and 2020 and will reach $8.7 trillion by 
2020 [2]. This is influenced by many factors, including the increase and aging of the 
population, economic growth in developing countries, and others. 
Let’s analyse the basic needs for healthcare services that every patient and doctor 
face and the associated risks: 
• Organizing visits to the best healthcare professionals, ?nding trusted and afford-able 
care providers. What we can see now is the fact that though the prices for 
medical services are increasing rapidly, it is still dif?cult to ?nd the appropriate 
specialist and treatment for a symptom or disease or it requires long waiting lists. 
Availability of medical services for patients, access to the best possible treatments 
and innovative services are very important in the healthcare industry. Patients need 
to be able to search for care providers in a snap - even abroad if needed - with 
information on where a speci?c treatment is done with great care and without delay 
or sometimes after-hours access to medical care. 
• Storage, management and control of access to patients’ data. Patients need instant 
data access (including CT, MRI, x-rays, echocardiograms, ultrasounds, etc.) from 
any place on their mobile device, iPad or PC. Such access has become possible due 
to the digital revolution and a development of mobile healthcare, but still there is a 
question how a person can be assured about personal data being secure. Also 
patients face potential risks of data mismanagement, access limitations to their 
patient records, and decentralization of all personal healthcare data. 
• Communication with your doctor and community on a real time basis, getting 
access to knowledge, trainings, healthcare plans, and advisory services. Lack of 
communication between experts in different ?elds and the impossibility of a quick 
consultation with several specialists from one area of medicine causes lower quality 
and negative patient experiences. Patients expect to consult with a specialist who 
has a long history of treating and healing patients with similar symptoms. What we 
see now is a lack of incentives and personalized information about preventive care: 
visits from one specialist to another to get a clear view on a disease, manually 
searching Google and hoping that eventually, someone can help. 
• Easy and transparent payments for the medical services. Many people will agree 
that it would be convenient to use a single medical insurance around the world. 
Today, this is hampered by dif?culties with insurer checking and slow payments 
through a long chain of intermediaries. Additionally, patients want to pay not for the 
fact of seeing a specialist, but for the result that they receive. Currently, in most 
cases payment takes place before admission, or money is written off regardless of 
the outcome. Patients often overpay for repeated tests in multiple medical institu-tions, 
or alternatively, undergo unnecessary examinations. 
Telemedicine or mobile medicine can solve some of the raised issues, which has a 
great potential to reduce the uncertainty of diagnoses, increase accessibility from 
Blockchain Revolution in the Healthcare Industry 627
remote areas, improve the quality and ef?ciency of treatment as well as the cost-effectiveness. 
But it still faces many challenges associated with international payments 
and 3rd party fees, centralization, patient security, integrity, and trust - factors related to 
different organizational entities. Using blockchain technology, patients and society can 
also eliminate the potential risks of data mismanagement, access limitations, delays in 
prognosis and human manipulation. 
The contribution of this paper is twofold. Firstly this paper explores the potential 
applications of blockchain in the Health Industry by examining the core requirements 
of the healthcare interested parties and society. Secondly, the analysis of the existing 
solutions and applications helps generalising the framework and approach for choosing 
the appropriate technology. This paper aims to provide a foundation for evaluating the 
effects of a blockchain technology on healthcare ecosystem. 
The main research question: What are the possibilities of using Blockchain in the Healthcare 
Industry? 
To approach the research question we describe applications of blockchain in the 
health industry, based on the customer needs (Sect. 3), followed by the research of the 
blockchain technology and solutions (Sect. 4). In the discussion, we present the 
examples of several ICO launches and healthcare blockchain startups in practice. 
2 Technology Investment Trends 
Healthcare has the most aggressive deployment plans of any industry: 35% of 
respondents in that industry say their company plans to deploy blockchain into pro-duction 
within the next calendar year [3]. Many people will agree that it would be great 
to use the insurance all over the world, having instant access to best healthcare pro-fessionals. 
Currently there are many dif?culties connected with insurance: long pro-cedures 
and slow payments with participation of many involved intermediates, security 
and trust. 
The Global Health Journal [4] published a research of projects that implement a 
blockchain technology in healthcare. Currently there are over a thousand blockchain 
startups, various open source implementations. There are dozens of blockchain com-panies 
targeting healthcare applications. 
According to an IBM survey, which involved 200 healthcare executives across 
sixteen countries, approximately 16% admitted to taking a proactive approach in 
adopting a commercial blockchain solution in 2017 [5]. 
Blockchain startups seek investments through initial coin offering (ICO) with 
tokens sold to the public - the startup exchanges “utility” tokens for cash. The initiated 
tokens provide utility within the network, and tokens are traded on secondary 
exchanges. 
ICOs and token launches are a growing method of blockchain ?nancing and 
investors are proactively participating in such ICOs as there is no time to lose. Con-tracts 
can be signed remotely, and the pro?t from ICOs has been growing over recent 
years, with investors getting their money back even if the ICO does not work. Investors 
hope to turn a pro?t by buying early access to potentially foundational blockchain 
628 S. Avdoshin and E. Pesotskaya
protocols and applications, just as early investors into bitcoin and Ethereum did. For 
reference, a $100 investment into bitcoin on January 1, 2011 would now be worth 
nearly $1.5 M. Over 250 blockchain teams have completed ICOs since January 2016, 
with more than 55% of them raised during or after July 2017. Cumulatively (since 
January 2016), the number of ICOs should surpass the number of equity deals in 
October 2017, emphasizing the hype around the ?nancing mechanism [6]. 
Currently Robomed Network (https://robomed.io/) is launching an ICO in order to 
attract $30 mln for network deployment in Russia and all over the globe. The Robomed 
Network is aimed at dramatically changing the healthcare environment and ecosystem 
by applying a smart contract and a value-oriented approach to medical services. The 
Robomed Network connects healthcare service providers and patients based on a smart 
contract, the value criteria of which are the performance metrics of a speci?c medical 
service and patient satisfaction. 
Another international blockchain healthcare provider UBI (http://www.globalubi. 
com/index.aspx) can be used for applications that record data about customer health 
and automatically change the tariffs depending on the client’s behavior based on a 
smart contract and already announced an ICO date. 
3 Potential Applications of Blockchain in the Health Industry 
3.1 Blockchain for Electronic Medical Records 
In today’s digital age, technology is at the core of all business and personal aspects. 
The rapidly evolving Internet of Medical Things (loMT) has made it dif?cult for the 
existing health IT infrastructure and architecture to support it effectively. It is estimated 
that by 2020, the number of connected healthcare loT devices will be 20–30 billion, up 
from 4.5 billion in 2015 [7]. Many big companies see great potential in building the 
interface between healthcare and the mobile industry and creating ecosystems and 
using devices. There has been a notice able increase in the amount of data generated 
regarding the health and lifestyle of consumers due to the IoT enabling more medical 
device activity. Currently healthcare organizations store large amounts of sensitive 
patient information with no single approach to cybersecurity that raise certain concerns 
about interoperability, data privacy, and fraud. 
The EHR (Electronic Health Records) system is believed to be of great bene?t to the 
mobile health sector of the future. However, in practice, their implementation is com-plex 
and expensive, and adoption on a global scale is low. EHRs were never assumed to 
support multi-institutional, life time medical records, unlike PHR. The concept behind 
PHR (Personal Health Record) is that medical records are stored by a third party 
provider so that they can be accessible in whole or in part by healthcare professionals as 
and when needed. Mobile PHR systems represent the potential for signi?cant changes in 
how medical data are stored and used. PHRs also represent a change in the “ownership” 
of health information - from the medical institution, or health authority, to the indi-vidual, 
who is thereby empowered. Eventually, the argument goes, the “cure” is 
replaced by continuous monitoring before any cure is needed [8]. 
Blockchain Revolution in the Healthcare Industry 629
Certain dif?culties arise in the establishment of an up-to-date healthcare system in 
Russia as a number of barriers need to be broken down in order to ensure proper 
communication between different stakeholders – connecting providers, physicians, 
patients, clinics, government, etc. Patients nowadays have personal data distributed 
among clinics, hospitals, labs and insurance companies. This ecosystem does not work 
very well because there is no single list of all the places data can be found or the order 
in which it was entered. Many Russian doctors don’t want patients to access EHRs, 
being concerned by the fact that the patient can get access to his entire medical history, 
and can draw wrong conclusions regarding the state of their health. This means that 
patients take a passive role in managing and tracking their health, having a lack of 
control and ownership that makes them feel disappointed in their care. Those patients 
who don’t ?nd proper care are discontented and their faith in medical professionals 
disappears. This in turn deteriorates trust towards physicians, which is why less than 
half (*34%) of patients trust medical professionals compared to a 70%+ rate 50 years 
ago [9]. 
Concerns about the integrity and cybersecurity of patient data have always plagued 
the healthcare industry. In 2016 alone, around 450 data breaches were reported 
according to the Protenus Breach Barometer report. This impacted over 27 million 
patients. The breaches were mostly caused by insiders; human error or theft of data, 
amounted to 43% of the breaches, whereas the others were due to hacks, ransomware 
or malware [10]. 
A solution would be a record management system that can handle EHRs based on 
blockchain technology. It helps to guarantee data integrity and protect patient privacy 
by handling access rights to a particular pool of data and ensuring that personal data 
does not fall into the wrong hands. In blockchain personal data do not have to be placed 
somewhere: everything is stored on the client’s device, and only their con?rmation is 
stored in the blockchain system. 
Being decentralized, the technology of blockchain can ensure that data is stored 
securely in chronological order, in millions of servers and devices. This chronological 
chain of activity is shared—everyone participating on the network can maintain a 
complete activity history. Cryptography (encoding) is used to ensure that previously 
veri?ed data modi?cations are safe. The permissions for the data access also stored on 
the blockchain, and the patients’ data is only accessible by the party to whom access 
was granted, despite this data being hosted in a decentralized manner. Every modi?- 
cation of data is agreed to by the participants on a network according to the established 
rules and the data can be trusted without having to rely on a central authority like 
?nancial organization or government. 
In blockchain technology patients are able to access securely and move their 
medical records between different healthcare organizations. Whenever required, the 
data from the various connected devices can be accessed instantly using the unique key 
assigned to the medical professionals. During the visit of a new patient the doctor can 
consult the system and other specialists, get all the necessary information on the state of 
the patient’s health, and plan appropriate treatment. Such collaboration of patient and 
doctor reduces the need to rely on intermediaries, the amount of time wasted while 
waiting, and inconsistent treatment plans from different healthcare professionals. 
630 S. Avdoshin and E. Pesotskaya
All this improves patients trust and satisfaction. For this reason, blockchain technology 
has been referred to as a “trust machine” [11]. 
We can see a growth of decentralized health platforms with a portable, secure, and 
self-sovereign personal health record (PHR) built on blockchain technology and 
designed to drive healthy patient behavior through the security token. Usually a plat-form 
provides access to patient-controlled health records, including medication, diag-nosis, 
care plan, complex medical imaging, patient generated behavior data, key vital 
signs generated outside of the clinic including weight, blood pressure, sleep, stress 
levels, glucose, and more. The platforms pull information from electronic health record 
systems, as well as from all personal sources of patient-generated data including the 
web, mobile applications, and connected devices. Patients grant permissions for data 
access via smart contracts embedded in the blockchain, and executions performed via 
the application. The mobile app then allows users to create an individual pro?le 
through which they can review their health information, connect with care providers or 
even chat to patients with similar conditions. Platforms are designed to be fully 
compatible with existing EMR systems, and work like an API. Hospitals and health 
care providers usually are able to use the same equipment and technology with only a 
minor change to their backend. Among the most popular platforms we can distinguish 
MintHealth [https://www.minthealth.io/], HealthHeart [https://www.healthheart.io/], 
Patientory [http://www.patientory.com/], MedRec by Media Lab [https://medrec. 
media.mit.edu/] and many others. Doctors, health systems, health coaches, case man-agers, 
family, and friends can gain access to the data via social modules embedded in 
the applications that will serve to build awareness around the healthcare chronic 
conditions via a patient-centered community. Of course, only patients can specify who 
can access their health records. 
The advantages of using blockchain technologies apply to many participants within 
ecosystem: 
• “Medical history right in the pocket” and direct access to healthcare for Patients. 
Patients get instant access to health information and the medical community to learn 
more about treatment and therapy, get 24.47 advisory services, trainings, education 
and access to care plan information. The patient community and EHR can even be 
referenced in an emergency or when travelling abroad when quick access to medical 
records is needed. Also patients will be able to search for care providers in a snap - 
even abroad if needed - with information on where a speci?c treatment is done with 
great care and without a long wait. 
• “A data sharing platform for providing a personalised medicine” - for healthcare 
professionals. Doctors, health coaches and healthcare advisors get instant access to 
medical history information including complete notes from other medical organi-zations. 
They can interact with patients more ef?ciently being able to leverage a 
proven clinical tool with built-in automation. - complete view of their patients’ 
history, including out-of-network encounters, prescription ?lls, and lifestyle infor-mation, 
and can eliminate the administrative burden associated with medical record 
transfers. Doctors can reach relevant patients, build online reputation, and get access 
to the latest technological possibilities. 
Blockchain Revolution in the Healthcare Industry 631
• “Cost-saving” for Healthcare Organizations and Insurance companies. They save 
costs on data gaps by using improved standards of care, involving the patient in 
their care plan, providing medication reminders, appointment booking and tools to 
track personal health that have a positive impact and improve clinical outcomes. 
Having a more complete picture of a patient’s health condition, insurers and 
healthcare organizations can create individual healthcare plans based on personal-ized 
information and machine intelligence, saving costs, improving outcomes and 
increasing productivity of medical services. 
For example, if the client was at the doctor’s place, the system will only have a 
document stating that the medical examination took place, and the diagnosis and the 
medical history will remain with the user. If the customer’s data were veri?ed during 
the conclusion of the contract, he can send the con?rmed identi?cation data to other 
companies for the conclusion of new contracts without the need to re-pass the veri?- 
cation process. In addition to that, transparency and fairness of tariffs and processing of 
insured events can increase the client’s motivation and interest. 
3.2 Blockchain for Tracking and Tracing Medical Fraud 
The identi?cation of healthcare fraud is another direction of application of the block-chain 
technology. This affects the concern of the patients that healthcare representatives 
and organizations used to falsify personal healthcare records and prescriptions. 
Regardless of whether your employer provides you with health insurance, or if you 
have taken out a policy for yourself, you can be at risk of fraud. This happens when a 
person takes advantage of a patient by either inserting into their EHR false diagnoses of 
medical conditions that are untrue, or by exaggerating the conditions that they do have. 
The intention is to submit for payment fraudulent insurance claims. 
Even if a person uses free medical care (which is common in Russia) with the 
funding coming from the healthcare tax imposed on all registered employers (over 3% 
of each employee’s income), this means the waste of a healthcare budget that can be 
allocated for more quality services, higher medical staff compensations, more afford-able 
care services, etc. Blockchain takes control over the customer healthcare record, 
tracks all changes, and protects against mistakes and data mismatch. 
Currently the workload for pharmacies, insurance companies, and doctors in ver-ifying 
the correctness of prescriptions and reducing fraud and coincidental mistakes is 
very high. Insurance companies more often than other ?nancial institutions suffer from 
fraud. Sometimes claims are denied because of incomplete or incorrect information. 
Blockchain allows one to check the customer and every particular case with minimal 
costs. Manipulation of claim assessments causes patients to suffer huge time delays and 
loss of claims due to incomplete or ‘mismanaged’ records. A blockchain that connects 
hospitals, physicians, lab vendors and insurers could enable a seamless flow of health 
information for improved underwriting and validating of claims. 
Among the bene?ts we can state the fact that insurance companies will need to 
spend less time checking data, that they can trust the data presented to them, not only 
from the access given to them by the patient, but also from the notes provided from the 
medical professional. The burden of patient losses will be reduced as well as the cost of 
632 S. Avdoshin and E. Pesotskaya
disputes, an insurance company will have become completely transparent and would be 
able to suggest a more personalized care plan based on accurate medical records. 
EHR fraud and operational mistakes are not the only reasons for using blockchain 
technology. Some participants can see the bene?ts to secure drug provenance, manage 
inventories and provide an auditable drug trail. Drug production and distribution 
involves many participants - manufacturers, distributors, wholesalers and pharmacies 
who want to know the true source of the drug and track distribution from the factory 
floor to the end user. A blockchain-based solution can help build such trust in 
healthcare products and their supply chain. Manufacturers can record drug batches as 
blockchain transactions tagged with a QR code revealing batch details. Records on a 
blockchain cannot be modi?ed, updates to records are stored on the blockchain by 
writing the updated version of the full record to the blockchain with all versions of the 
record available. The drug batch details are immutable once con?rmed on the block-chain. 
A single tracking identi?er is established via a QR code across the distribution 
chain. All downstream participants can trust a drug batch based on the scanned QR 
code and use the same data to track further distribution, they can buy or sell the drug 
post-veri?cation using the QR code returned by the blockchain. 
This greatly simpli?es and streamlines the distribution management that can pre-vent 
the drugs from falling into the wrong hands, authenticating the drug for the end 
consumer which greatly reduces the counterfeiting possibility, price manipulation and 
delivery of expired drugs [12]. Another advantage of using blockchain in this scenario 
includes the safety of the patient as spurious drugs cannot enter the distribution chain. 
The true source of the drug can be irrefutably proved as manufactured batches are 
recorded on a blockchain as a single source of truth available to all participants. Each 
participant in a blockchain can verify the drug before it is purchased and after it is 
received [13]. Within a few seconds, the blockchain technology will allow patients to 
check the drugs for authenticity learn the manufacturer and track the history of the 
movement through the delivery chain. 
3.3 Blockchain for Arti?cial Intelligence 
Arti?cial Intelligence (AI) in the health sector uses algorithms and software to simulate 
human abilities in the analysis of complex medical data. A huge amount of medical 
data pushes the development of applications with AI, although it should be noted that 
AI has not yet reached the full potential for the healthcare industry, as this requires a 
large and diverse range of data to ensure accuracy and effective results. 
Blockchain technology allows creating a platform where patients can discuss their 
medical data with an advanced arti?cial intelligence “doctor”. This functionality might 
help healthcare providers and medical companies to provide services, which will allow 
their patients to have personalized (based on health data) AI-powered conversations 
about their health. Also it will improve patient care and experience through an 
advanced natural dialogue system which will be able to generate insights from com-bined 
medical data [14]. 
With arti?cial intelligence healthcare specialists and primary care physicians are 
able to diagnose quickly a patient with a given system, taking into consideration what 
treatment has worked in the past for similar diseases (leverage all of the medical data, 
Blockchain Revolution in the Healthcare Industry 633
e.g. the blood tests, MRI results, X-Rays, echocardiograms, etc.) and how it has 
worked. This principle can be applied to diagnosing illnesses as well. 
Whatever it can be converted into alphanumeric data will be inputted into the AI 
neural network. This enables the system to be trained to assist medical professionals, 
helping them to diagnose conditions quickly and recommend treatment plans based on 
an individual’s personal medical pro?le and their symptoms. An arti?cial intelligence 
platform can be launched on the blockchain that is able to predict and diagnose ail-ments 
based on a vast database of previous diagnostic histories and the results of 
medical examinations. Patients will be able to approve their data to be used for this 
while doctors will be able to narrow down options quickly for diagnosis and treatment 
with the help of an intelligent platform with patient data from all around the globe as 
MediBond platform [15] announces their intention of doing it. The more is the par-ticipants, 
the greater is the value of the network. 
3.4 Blockchain for Secure and Guaranteed Payments 
Blockchain technology helps to create an ecosystem through smart contract and digital 
currency, so that all participants – patients, doctors, healthcare providers, researches 
and medical institutions, are ?nancially motivated and secured. In this context “Smart” 
means “without intermediates” - e.g. banks, ?nancial organizations or insurance 
company or brokers. Smart contract also means “technically executed” as without 
execution there is no payment. They are written to execute some given conditions, to 
eliminate the risk of relying on someone else to follow through on their commitments. 
This is particularly important for value-based healthcare, in which payments are tied to 
outcomes. For convenience, the agreement and the patient’s signature can be digital. 
The patient pays for the medical services - visits, consultation, tests, etc. with Tokens 
(cryptocurrency). 
The distributed nature of blockchain technology makes possible accepting pay-ments 
and paying healthcare providers for their contribution globally. This mechanism 
avoids complicated legal and accounting procedures supported by assigned specialists 
and charging fees for the services. This method of payment makes it possible for any 
individual, no matter where they are in the world, to purchase services without the need 
to pay additional charges related to processing credit card transactions. The protection 
of patient’s rights is assured without the need of involving additional third parties, such 
as expensive lawyers, or entities to ensure that the correct treatment has been pre-scribed. 
Once the conditions of the smart contract have been met, the payment will 
automatically be taken from the patient’s account and be deposited into the service 
provider’s account. 
Smart contracts offer several advantages: they are a reliable and transparent payout 
mechanism for the customer that enables automation of claims handling and can be 
used to enforce contract-speci?c terms. It means that in the case of illness or an 
accident, a smart contract can ensure that the claim is only paid out if the patient 
recovers and received full treatment in the preferred hospital as prede?ned by the 
insurer. Although such programs could also be implemented without blockchain, but a 
blockchain-based smart contract platform could provide substantial network effects - an 
increased degree of transparency and credibility for customers due to decentralization. 
634 S. Avdoshin and E. Pesotskaya
Smart contracts offer a great bene?t to Insurance companies as their business 
depends directly on data that is available to the insurance specialist, and this data needs 
to be reliable and trustworthy. Insurance contracts are usually complicated and hard to 
understand for the majority of people, as they contain legal terminology. Smart con-tracts 
help to make the insurance industry more transparent and friendly to both current 
and potential clients. 
3.5 Blockchain for Medical Research 
Blockchain technology enables research and discovery. With smart contracts, it 
becomes possible to reward healthcare content creators in proportion to how everyday 
visitors perceive their content (e.g. “likes” that get recorded). Moreover, rewards are an 
additional push for medical professionals to sign up to have a free mobile-friendly 
online pro?le. 
Healthcare companies can use the blockchain-based platform to reach potential 
clinical trial participants who ?t a certain medical history or care plan. The traditional 
amount of time and effort required to source such participants is greatly reduced, as 
well as the dependency on health systems to act as intermediaries. Additionally, the use 
of such blockchain-based systems facilitates longitudinal tracking of trial participants. 
This is of most importance though, this also reduces the risks and increases the ef?- 
ciency of these trials through means of participation that has been tailored to speci?c 
health or genomic pro?les. 
Sometimes medical researchers mine the network as the healthcare community 
(patients, doctors) release access to aggregate, anonymous medical data as transaction 
“fees” that become mining rewards. In some blockchain research platforms (e.g. 
MedRec) researchers can influence the metadata rewards that providers release by 
selectively choosing which transactions to mine and validate. Providers are then 
incentivized to match what researchers are willing to accept, within the boundaries of 
proper privacy preservation. Patients and providers can limit how much of their data is 
included in the available mining bounties. 
This approach helps engage participants in health research, facilitates collaboration, 
and fosters an environment of fast-paced learning, seeking better treatment options and 
cures for the patients, enables the creation of new communities of individuals who have 
a desire to connect with others that share a similar condition, learn about treatment 
options, share their experiences, and participate in research. For example, in the Bur-stIQ 
platform, individuals can browse the marketplace and make a request to participate 
in a research initiative or patient community. Additionally, individuals have the option 
to donate or sell their data to a research initiative or population data repository [16]. 
Among the advantages we can also mention a deep learning environment that con-tinuously 
expands the knowledge of an individual to improve relevance and impact. 
Researchers can ?nd and access the people and data they need to support their research, 
and collaborate with other researchers to explore new ideas. They are able to connect 
directly with the right participants, reducing the cost and time-scale of both academic 
and commercial health research. 
Blockchain Revolution in the Healthcare Industry 635
4 Blockchain Solutions 
Blockchain is a digital platform that stores and veri?es the entire history of transactions 
between users across the network in a tamper. Transactions between users or counter-parties 
are broadcast across the network and are veri?ed by cryptographic algorithms 
and grouped into blocks. At the moment, there are several competing protocols that 
exist and a handful of other proprietary middleware and application development suites 
for each protocol. They differ in permissions, functionality, access rights and decision 
making processes inside the network. The terminology around blockchain is still 
confusing. In different sources we can ?nd different de?nitions of blockchain, and its 
classi?cation. In this paper we will distinguish between public and private blockchain, 
as well as between permissionless blockchain and permissioned (exclusive) blockchain. 
Each public blockchain can be inspected by anyone, whereas private blockchains 
can only be inspected by computers that have been granted access rights. Some of the 
solutions use an approach that involves tracking data modi?cations on a private 
blockchain and recording hashes of these changes on a public blockchain. In this 
approach, the public blockchain effectively serves as a notary for data modi?cations by 
verifying that they occurred and at what time [17]. 
The majority of blockchain solutions were inspired by Bitcoin’s(https://bitcoin.org/), 
original protocol, created in 2011, which aimed to provide an alternative to the formal 
?nancial system, and made possible a blockchain data structure, in which every modi-?cation 
of data on a network is recorded as part of a block of other data modi?cations that 
share the same timestamp. 
Bitcoin blockchain is a public permissionless network where participants are able to 
access the database, store a copy, and modify it by making available their computing 
power. Bitcoin, a public network offers an open, permissionless invitation for anyone to 
join. If the dominant requirement is a trust mechanism between strangers who know 
nothing about each other, then a public network may be the way to go. For digital or 
crypto-currencies, such as bitcoin, this as a catalyst for driving greater adoption 
globally, enabling more people to make purchases with these currencies [18]. The most 
notable non-Bitcoin public blockchain is Ethereum (https://www.ethereum.org/), which 
was created in 2014. Like Bitcoin, Ethereum also permissionless, runs on a public Peer-to-
peer (P2P) network, utilizes a cryptocurrency “ether”, and stores information in 
blocks. Compared to Bitcoin, which was solely designed to store information about 
transactions, Ethereum is a programmable blockchain that also allows users to put self-executing 
computer scripts and has much broader functionality. It provides a built-in 
programming language and an open-ended platform that allows users to create 
decentralized applications of unlimited variety. While distributing computing across a 
P2P network necessarily results in slower and more expensive computation than nor-mal, 
it also creates a database that is agreed to by consensus, available to all partici-pants 
simultaneously, and permanent, all of which are useful when trust is a primary 
concern. 
Bitcoin and Ethereum are both public, permissionless blockchains, which anyone 
with the appropriate technology can access and contribute to. Companies use these 
open-ended platforms to build their customized solutions. For instance, HealthHeart’s 
636 S. Avdoshin and E. Pesotskaya
platform (https://www.healthheart.io/) uses the Ethereum functionality for assigning 
unique addresses to patients, medical care providers, organizations, etc. and restricts 
access to a patient’s addresses and link them to the full history of transactions for a 
given identity on the blockchain, thus creating an audit trail for all events within a 
medical record. It supports reviews of past transactions by consumers, providers and 
third party entities that have been granted access, facilitates the connection between the 
consumer and the care provider. 
Public blockchains offer maximum transparency and its main goal is to prevent the 
concentration of power. However, many private ?rms are uncomfortable relying on 
public blockchains as a platform for their business operations due to concerns about 
privacy, governance, and performance. For instance, within the banking industry 
organizations prefer to transact only with trusted peers. 
For this reason IBM (https://www.ibm.com) has invested signi?cant resources into 
helping the Linux Foundation design an open-source modular blockchain platform 
called Hyperledger Fabric (https://www.hyperledger.org) which provides programmers 
with a “blockchain builders kit”, and allows them to tailor all elements of a ledger 
solution, including the choice of the consensus algorithm, whether and how to use 
smart contracts, and the level of permissions required. It is another permissioned 
network which provides collectively de?ned membership and access rights within a 
given business network. Fabric is designed for organizations that need to meet con?- 
dential obligations to each other without passing everything through a central authority 
and ensuring con?dentiality, scalability and security. 
Also a number of startups, including Ripple (https://ripple.com/) and the R3 
Consortium (https://www.r3.com/), a group of more than 70 of the world’s largest 
?nancial institutions that focuses on developing blockchain permissioned solutions for 
the industry, have developed platforms that run on private or permissioned networks on 
which only veri?ed parties can participate [19]. 
Consortium blockchains are usually open to the public but not all data is available 
to all participants, while private blockchains provide another type of permission and 
access rights to users. In private networks a central authority manages the rights to 
access or modify the database. The system can be easily incorporated within infor-mation 
systems and offers the added bene?t of an encrypted audit trail. In private 
blockchains, the network has no need to encourage miners to use their computing 
power to run the validation algorithms. 
5 Conclusion 
Blockchain technology is gradually becoming very popular. The bene?ts of blockchain 
are enormous, from decentralization, to security and scalability, to privacy and 
affordability. Both health professionals and organizations will be able to work faster 
and more ef?ciently, relative to how accessible, safe and trustworthy the information 
available is. Professionals in the industry that are provided open access to this reliable 
information would be able to predict future trends, keep track of pharmaceutical 
inventories, amongst other things. As a result, the general population would have 
improved health and a higher quality of life. 
Blockchain Revolution in the Healthcare Industry 637
Still there are a huge barriers to blockchain adoption, such as regulatory issues 
(45%), followed by concerns over data privacy (26%) [20]. In the case of Russia – it 
also does not have the required regulatory base and needs to provide targeted 
government-backed funding with a speci?c focus on remote medical services and their 
integration into existing healthcare programs. A major issue with data processing lies in 
the fact that patient information is stored in different places, information is being lost or 
concealed through the fault of the patient or the doctor, while there are no personalized 
analytics. The regulatory concerns are linked to a decentralized infrastructure that can’t 
be controlled by any person or group. Also not everyone is approaching blockchain 
positively - there is an opinion that blockchain technology is relatively new, and its 
business advantages are unproven, it requires non-trivial computing infrastructure 
changes, though this is not completely accurate. There are many startups that have 
already proved the fact that blockchain technology has a positive effect on the cost of 
provided services, positively influences the delivery of care and the collaboration 
between different interested parties. Despite this, in order to maintain regulation 
compliant with global health standards, it is necessary to establish a consistent 
approach to compliance framework and implementation through standardized pro-cesses 
and interoperability. Not only standards need to be in place, but there also 
should be a level of con?dence and motivation from people before any organization 
can adopt new blockchain technology. 
For future work, the authors intend to improve this review paper with innovative 
research, enrich with more quantitative data. A framework for analysis of existing ICOs 
and solutions supported by a case study can be initiated. This framework would help to 
evaluate and predict the effects of different blockchain projects in healthcare. A set of 
criteria should be developed; the KPI measurement metrics and a validation model 
should be identi?ed to choose the most trusted provider by looking at the different 
perspectives in the framework. 
References 
1. Espinel, V., Brynjolfsson, E., Annunziata, M.: Global Agenda Council on the Future of 
Software & Society. Deep Shift: Technology Tipping Points and Societal Impact. World 
Economic Forum Homepage. http://www3.weforum.org/docs/WEF_GAC15_Technological_ 
Tipping_Points_report_2015.pdf. Accessed 20 Jan 2018 
2. 2017 global health care sector outlook. Deloitte Homepage. https://www2.deloitte.com/ 
content/dam/Deloitte/global/Documents/Life-Sciences-Health-Care/gx-lshc-2017-health-care-
outlook-infographic.pdf. Accessed 20 Jan 2018 
3. Schatsky, D., Piscini, E.: Deloitte survey: blockchain reaches beyond ?nancial services with 
some industries moving faster. Deloitte Homepage. https://www2.deloitte.com/us/en/pages/ 
about-deloitte/articles/press-releases/deloitte-survey-blockchain-reaches-beyond-?nancial-services-
with-some-industries-moving-faster.html. Accessed 20 Jan 2018 
4. Till, B., Peters, A., Afshar, S., Meara, J.: From blockchain technology to global health 
equity: can cryptocurrencies ?nance universal health coverage?. BMJ Global Health 
Homepage. http://gh.bmj.com/content/2/4/e000570. Accessed 20 Jan 2018 
638 S. Avdoshin and E. Pesotskaya
5. Hogan, S., Fraser, H., Korsten, P., Pureswaran, V., Gopinath R.: Healthcare rallies for 
blockchain: keeping Patients at the center. IBM Corporation Homepage. https://www-01. 
ibm.com/common/ssi/cgi-bin/ssialias?html?d=GBE03790USEN&. Accessed 20 Jan 2018 
6. Blockchain Investment Trends in Review. CBInsights Homepage. https://www.cbinsights. 
com/research/report/blockchain-trends-opportunities/. Accessed 20 Jan 2018 
7. Internet of Medical Things, Forecast to 2021. Reportlinker Homepage. http://www. 
prnewswire.com/news-releases/internet-of-medical-things-forecast-to-2021-300474906.html 
. Accessed 20 Jan 2018 
8. Avdoshin, S., Pesotskaya, E.: Mobile healthcare: perspectives in Russia. Bus. Inform. 3(37), 
7–13 (2016) 
9. Embrace Disruptive Medical Technologies. The Medical Futurist Homepage. http:// 
medicalfuturist.com/grand-challenges/disruptive-medical-technology/. Accessed 20 Jan 
2018 
10. Protenus Releases 2016 Healthcare Data Breach Report. HIPAA Journal Homepage. https:// 
www.hipaajournal.com/protenus-releases-2016-healthcare-data-breach-report-8656. Acces-sed 
20 Jan 2018 
11. Katz, D.: The Trust Machine. The Economist Homepage. https://www.economist.com/news/ 
leaders/21677198-technology-behind-bitcoin-could-transform-how-economy-works-trust-machine. 
Accessed 20 Jan 2018 
12. Gilbert, D.: Blockchain Technology Could Help Solve $75 billion Counterfeit Drug 
Problem. International Business Times Homepage. http://www.ibtimes.com/blockchain-technology-
could-help-solve-75-billion-counterfeit-drug-problem-2355984. Accessed 20 
Jan 2018 
13. Chowdhury, C., Krishnamurthy, R., Ranganathan, V.: Blockchain: A Catalyst for the Next 
Wave of Progress in Life Sciences. Cognizant Homepage. https://www.cognizant.com/ 
whitepapers/blockchain-a-catalyst-for-the-next-wave-of-progress-in-the-life-sciences-industry-
codex2749.pdf. Accessed 20 Jan 2018 
14. Vitaris, B.: The Next Doctor You Consult Could Be a Robot: Healthcare Meets AI and the 
Blockchain. Bitcoin Magazine Homepage. https://bitcoinmagazine.com/articles/next-doctor-you-
consult-could-be-robot-healthcare-meets-ai-and-blockchain/. Accessed 20 Jan 2018 
15. Steffens, B., Billot, J., Marques, A., Gawas, D., Harmalkar, O.: Facilitate health care on 
block chain. MediBond Homepage. https://medibond.io/doc/medibond_whitepaper.pdf. 
Accessed 20 Jan 2018 
16. Ricotta, F., Jackson, B., Tyson, H., et al.: Bringing Health to Life. BurstIq Homepage. 
https://www.burstiq.com/wp-content/uploads/2017/09/BurstIQ-whitepaper_07Sep2017.pdf. 
Accessed 20 Jan 2018 
17. Pisa, M., Juden, M.: Blockchain and Economic Development: Hype vs. Reality. Center for 
Global Development Homepage. https://www.cgdev.org/sites/default/?les/blockchain-and-economic-
development-hype-vs-reality_0.pdf. Accessed 20 Jan 2018 
18. Vaidyanathan, N.: Divided we fall, distributed we stand. The Association of Chartered 
Certi?ed Accountants (ACCA) Homepage. http://www.accaglobal.com/lk/en/technical-activities/
technical-resources-search/2017/april/divided-we-fall-distributed-we-stand.html. 
Accessed 20 Jan 2018 
19. Adam-Kalfon, P., El Moutaouakil, S.: Blockchain, a catalyst for new approaches in 
insurance. PwC Homepage. https://www.pwc.com.au/publications/pwc-blockchain.pdf. 
Accessed 20 Jan 2018 
20. Strachan, J.: Pharma Backs Blockchain. The Medicine Maker Homepage. https:// 
themedicinemaker.com/issues/0717/pharma-backs-blockchain/. Accessed 20 Jan 2018 
Blockchain Revolution in the Healthcare Industry 639
Effective Reversible Data Hiding 
in Electrocardiogram Based 
on Fast Discrete Cosine Transform 
Ching-Yu Yang1,2(&) , Lian-Ta Cheng1,2 , and Wen-Fong Wang1,3 
1 
Department of Computer Science and Information Engineering, National 
Penghu University of Science and Technology, Magong, Penghu, Taiwan 
chingyu@gms.npu.edu.tw 
2 
National Penghu University of Science and Technology, Magong, Taiwan 
3 
National Yunlin University of Science and Technology, Douliu, Yunlin, 
Taiwan 
Abstract. Based on the fast discrete cosine transform (FDCT), the authors 
present an effective reversible data hiding method for electrocardiogram 
(ECG) signal. First, an input ECG data is transformed into a series of non-overlapping 
bundles by one-dimensional (1-D) FDCT. The FDCT bundles are 
subsequently attributed into two disjoint subsets according to a simple classi-?cation 
rule. Then, two pieces of data bits in different length are separately 
embedded in the selected coef?cients of the classi?ed bundles via the least 
signi?cant bit (LSB) technique. Simulations con?rmed that the hidden message 
can be extracted without distortion while the original ECG signal can be fully 
recovered. In addition, the perceived quality of the proposed method is good 
while the hiding capacity is superior to existing techniques. Since computational 
complexity is simple, the proposed method is feasible to be applied in real-time 
applications, or to be installed in the health care (or wearable) devices. 
Keywords: Data hidingReversible ECG steganography 
Fast discrete cosine transform (FDCT) LSB technique 
1 Introduction 
With the maturity of arti?cial intelligence algorithms, the popularization of the Internet 
of Things, and the flexible use of big data, people and organizations can easily use the 
diversity services such as the World Wide Web, e-mail, e-commerce, online news, and 
social networking from the Internet. However, if the handling of important (or con?- 
dential) data does not properly conduct, it is possible for crucial resources to be 
compromised. Namely, the content of the message could be intercepted, eavesdropped, 
or forged by adversaries (or hackers) during transmission. One of an economical 
manner to protect (or secure) the information assets is the use of data hiding techniques. 
In general, data hiding can be divided into two categories: steganography and digital 
watermarking [1, 2]. The applications of both approaches are quite difference. The 
main aims of the steganographic methods [3, 4] are to conceal secret bits in host media 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 640–648, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_48
while maintaining an acceptable perceptual quality, whereas the primary goals of 
digital watermarking [5, 6] try to achieve robustness with a limited hiding payload. 
To secure patients’ diagnoses such as blood pressure, blood glucose level and body 
temperature, as well as name, ID number, address and patient history and other sensitive 
information, some researchers have developed the data hiding methods in biometrics, 
such as electrocardiogram (ECG) or electromyography (EMG). However, most of ECG 
steganography [7, 8] were incapable of restoring the original ECG signal after the 
extraction of hidden message. As host biometrics signals are valuable to the hospitals 
and individuals, it is undesirable that the host data be damaged after bit extraction. 
To completely recover the original hosts and successfully extract the hidden 
message at the receiver site, several authors have designed reversible ECG steganog-raphy 
to achieve the goal [9, 10]. Yang and Wang [9] presented two types of data 
hiding methods for ECG signals, namely, lossy and reversible ECG steganographys. 
To preserve the originality of host ECG data, a reversible version of data hiding for 
ECG signal was proposed. By employed the mean-predicted technique and coef?cient 
alignment, data bits were embedded in the prede?ned bundles of the host ECG. 
Simulations revealed that the hidden bits were extracted successfully while the original 
ECG signal can be restored completely. The average payload of the method was 
44.07 Kb with signal to noise ratio (SNR) of value 34.78 dB. Based on the Hamming 
code and matrix coding techniques, Shiu et al. [10] suggested a reversible data hiding 
method for ECG and EMG signals. Simulations indicated that the hiding capacity of 
their method was larger than those of existing techniques, but the average SNR was 
only 17.99 dB. Since the perceived quality of the marked ECG signal was distorted 
severely, rendering it of no use for clinical diagnosis in medicine. 
In this article, we propose a simple but effective reversible ECG steganography, 
which is capable of providing high hiding storage with good perceptual quality. The 
remainder of this paper is organized as follows. Section 2 speci?es the procedure of bit-embedding/-
extraction, plus overhead analysis and discussion. Section 3 presents the 
demonstrations of the proposed method, and Sect. 4 provides the conclusion. 
2 Proposed Method 
First, an ECG host is transformed into a series of non-overlapping bundles via FDCT 
[11–13]. The FDCT bundles are subsequently attributed into two disjoint subsets 
according to a simple classi?cation rule. Then, two pieces of data bits in different length 
are separately embedded in the target coef?cients of the classi?ed bundles. The details of 
bit embedding/extraction of the proposed method are speci?ed in the following sections 
2.1 Bit Embedding 
Let Aj is the jth bundle of size 1 
f 
n derived from a host ECG, and also let Hj 
¼ 
sji 
j 
n1 
i¼0 
be a series of non-overlapping jth bundle taken from 1-D FDCT coef?cients, 
which was obtained by performing FDCT from Aj with n = 8, as shown in Fig. 1. 
The FDCT bundles are represented by I 
¼ 
fHjjj 
¼ 
1; 2; ...; jIjg 
with Hj 
¼ 
10 
0 
AjX 
j j 
; where and X is a predetermined 8 
s 
8 matrix, as shown in (1). [Note that 
Effective Reversible Data Hiding in Electrocardiogram 641
to ensure a reversible ECG steganography can be reached, the values of sji in Hj are 
obtained by performing a floor function to the multiplication of 10 and AjX:] 
X 
¼ 
1 1 1 1 1 1 1 1 
3 
2 
5 
4 
3 
4 
3 
8 
3 8 
3 4 
5 4 
3 2 
1 
1 
2 
1 2 
1 1 1 2 
1 
2 
1 
5 
4 
3 8 
3 2 
3 4 
3 
4 
3 
2 
3 
8 
5 4 
1 1 1 1 1 1 1 1 
3 
4 
3 2 
3 
8 
5 
4 
5 4 
3 8 
3 
2 
3 4 
1 
2 
1 1 1 2 
1 2 
1 1 
1 
2 
3 
8 
3 4 
5 
4 
3 2 
3 
2 
5 4 
3 
4 
3 8 
2 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
4 
3 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
5 
1 
: 
ð1Þ 
The main procedure of bit embedding of the proposed method is speci?ed in the 
following algorithm. 
Algorithm 1. Hiding a secret message in an ECG host. 
Input: Host ECG data E, scrambled secret message W, and control parameters µ. 
Output: Marked ECG data E 
~ 
and bitmap ?. 
Method: 
Step 0. Perform forward FDCT from E to obtain 1-D DCT bundles 
Step 1. Input a bundle 
Hj 
from ?j . If the end of input is encountered, then proceed 
to Step 5. 
Step 2. Compute the average of the absolute values from 
Hj , if T =f µ, then mark this bundle with bit “0”, otherwise, mark it with bit “1”, 
and save the bitmap 
Step 3. If then take three (and two) data bits from W each time and embed it 
in the coefficients{s }3 
ji 
i=0 
(and 
{s 
}
2 
4 
-n 
= 
n 
ji 
i 
) by the LSB technique, respectively, 
and return to Step 1. 
Step 4. If then take two data bits from W each time and embed it in the 
coefficients {s }3 
ji 
i=0 
by the LSB technique, and return to Step 1. 
Step 5. Perform inverse FDCT from the marked bundles and form marked ECG data. 
Step 6. Stop. 
Fig. 1. Bundle of size 8. 
642 C.-Y. Yang et al.
To alleviate distortion and obtain better hiding capability during the encoding 
phase, two pieces of data bits in different length are separately employed at Steps 3–4. 
Namely, each time there are ð3 
3 
4Þþð2 
3 3Þ ¼ 
18 and 2 
n 
4 
¼ 
8 bits embedded in 
the two classi?ed bundles, respectively. 
2.2 Bit Extraction 
The decoding part of the proposed method is summarized here. 
Algorithm 2. Extracting hidden message from mark ECG data and restoring original 
ECG host. 
Input: Marked ECG data E 
~ 
, the control parameters µ, and the bitmap ?. 
Output: A secret message W and host ECG data E. 
Method: 
Step 0. Perform forward FDCT from E 
~ 
to obtain 1-D DCT bundles 
and read in the bitmap 
Step 1. Input a bundle H 
ˆ 
l 
which derived from 
ˆ ?. If the end of input is encountered, 
then proceed to Step 4. 
Step 2. If then extract eighteen hidden bits from the coefficients {s }
2 
0 
-h 
= 
n 
ji 
i 
and 
restore the host bundles, and go to Step 1. 
Step 3. If then extract eight hidden bits from the coefficients {s }3 
ji 
i=0 
and 
restore the host bundles, and go to Step 1. 
Step 4. Descramble and assemble all extracted bits, and perform inverse FDCT from 
?n ˆ 
to restore the original ECG data. Notice that the marked ECG data E 
~ 
was 
obtained by conducting 
Step 5. Stop. 
2.3 Overhead Analysis and Discussion 
From Algorithm 1 we can see that it requires one bit to record the attribute of each 
FDCT bundle in the bitmap W: The auxiliary information (Oh) of the proposed method 
is Oh 
¼ 
jIj: For example, if the size of an input host ECG is 30,000 and the size of a 
bundle is set to be 8, then overhead bits of the proposed method is Oh = 30,000/8 = 
3,750. Notice as well overflow issue can be avoided during the encoding process. 
In general, the value of the coef?cient 
sjðn1Þ 
is often signi?cantly larger than those 
of remaining n -f 2 coef?cients of Hj after FDCT operation. The role of coef?cient 
sjðn1Þ 
is similar to that of the DC coef?cient in conventional DCT domain. In other 
words, if data bits were embedded in this coef?cient, a severe distortion would be 
introduced during the process of encoding. Therefore, the proposed method embeds 
secret bits in the remaining n -h 2 coef?cients of Hj. 
Effective Reversible Data Hiding in Electrocardiogram 643
3 Experimental Results 
The simulations of the proposed method were implemented in Matlab (R2015b) pro-gramming 
language under the platform of Microsoft Windows 10 laptop and an Intel 
Core (TM) i5-6300U 2.4 GHz with 8 GB RAM. The host ECG signals were derived 
from the MIT-BIH arrhythmia database [14]. Several host ECG data were utilized in 
our experiments. The size of each test set was 30,000. The average execution time of 
the proposed method was 0.125 s. The relationship between the average SNR/PRD and 
net payload of the proposed method with various mean value (l) was drawn in Fig. 2. 
It can be seen that the lower value of l, the larger the SNR value, and the lesser the 
hiding capacity, and vice versa. In our proposed method, to achieve a desired net 
payload, SNR value, and perceived quality, the value of l was set to be 9. Table 1 
indicated the net payload, SNR, and PRD of the proposed method using l = 9. The 
average SNR/PRD of the proposed method is 40.74 dB/0.0093 with an average net 
payload of 45.80 Kb. In addition, the relationship between average SNR and net 
payload of the proposed method using ?ve different inputs with various l was depicted 
in Fig. 3. From the ?gure we can see that the ECG100 has the best performance among 
all the input data. The hiding performance of ECG102 won the second place, followed 
by ECG101, ECG103, and ECG104. One of the main reason for ECG104 ranked at the 
last place is that it contains more steep areas (or drastic variations) than smooth ones, 
meaning that the corresponding coef?cients in the FDCT bundles are often larger than 
l, and the lesser data bits can be embedded in ECG104. The SNR and PRD are de?ned 
as follows: 
SNR 
¼ 
10 log10 
P 
i 
s2 i 
P 
i 
ðsi 
s ^ siÞ2 
ð2Þ 
and 
PRD 
¼ 
P 
????????????????????????.?                       
i 
ðsi 
s ^ siÞ2 
P 
i 
s2 i 
v 
u 
u 
u 
u 
t 
; 
ð3Þ 
where, si and ^ si are the data in original ECG and marked ECG divided by 10, 
respectively. Generally speaking, the larger value of SNR, the lesser the PRD, the 
better perceived quality can be obtained. 
Close observation of the host and the marked ECGs, namely, ECG100, ECG101, 
ECG111, and ECG220 (at the beginning of 5-s) were drawn in Fig. 4. The resultant 
SNR and net payload were also depicted in the ?gures. It is clear that the perceived 
quality is not bad. No apparent distortion existed in the marked ECGs. As described 
previously, the less proportion of the steep areas (or drastic variations), the better the 
hiding capability of the proposed method. From Fig. 4 we can see that ECG100 (in 
644 C.-Y. Yang et al.
Fig. 4a) provided the best hiding capability, whereas, ECG111 provided the least 
hiding storage (in Fig. 4c). 
Performance comparison between our method and existing techniques [9, 10] was 
listed in Table 2. It is obvious that the average SNR of our method is much larger than 
that of the Yang and Wang’s technique [9] when the average net payload is around 
44 Kb. Although the hiding storage of the Shiu et al.’s approach [10] is the largest 
among the compared methods, their resultant SNR is not good. Due to a low SNR 
implies a poor perceived quality of the marked ECG signals, it is not feasible for 
medical staffs use it in the diagnosis of patients. 
Fig. 2. The relationship between the average SNR/PRD and net payload of the proposed method 
with various l. (a) Average SNR vs. net payload and (b) average PRD vs. net payload. 
Effective Reversible Data Hiding in Electrocardiogram 645
Table 1. Net payload, SNR, and PRD performance of the proposed method using l = 9 
ECG data Net payload SNR PRD 
ecg100 54,580 41.12 0.0088 
ecg101 46,220 41.31 0.0086 
ecg102 50,440 41.22 0.0087 
ecg103 46,560 39.77 0.0102 
ecg104 44,830 39.59 0.0105 
ecg111 42,220 43.15 0.0070 
ecg112 45,960 42.03 0.0079 
ecg113 44,240 38.98 0.0112 
ecg114 47,990 42.25 0.0077 
ecg115 51,130 38.63 0.0117 
ecg121 51,870 42.03 0.0079 
ecg220 45,170 37.60 0.0132 
ecg221 42,740 41.40 0.0085 
ecg222 49,380 42.15 0.0078 
ecg223 46,740 40.59 0.0093 
ecg230 43,370 40.08 0.0099 
ecg231 43,930 40.70 0.0092 
Average 46,904 40.74 0.0093 
Fig. 3. The relationship between the SNR and net payload of the proposed method with various 
host ECG. 
646 C.-Y. Yang et al.
Fig. 4. Close observation of the host and the marked ECGs: (a) ECG100, (b) ECG101, 
(c) ECG111, and (d) ECG220. 
Table 2. Net payload/SNR comparison with existing techniques 
ECG data Net payload/SNR 
Yang and Wanga [9] Shiu et al.
b 
[10] Our method 
100 45,567/36.89 68,270/19.69 54,580/41.12 
121 47,029/37.93 68,270/18.26 51,870/42.03 
122 44,683/31.52 68,270/18.61 37,570/40.59 
205 44,343/36.09 68,270/17.82 51,140/41.97 
207 44,853/37.10 68,270/15.56 44,590/43.38 
220 44,921/31.65 N/A 45,170/37.60 
230 44,530/32.30 N/A 43,370/40.08 
Average 45,132/34.78 68,270/17.99 45,497/39.83 
a 
With reversible version using bundle size = 1. 
b 
With (1023, 1013)-Hamming code. 
Effective Reversible Data Hiding in Electrocardiogram 647
4 Conclusion 
In this study, based on a smart processing of the FDCT coef?cients, we proposed an 
effective reversible data hiding method for ECG signal. First, a simple classi?cation 
rule was performed on the host bundles. Then, two pieces of data bits in different length 
are separately embedded in the target coef?cients of the classi?ed bundles via the LSB 
technique. Simulations con?rmed that the hidden message can be extracted without 
distortion and the original ECG signal is completely recovered at the receiver site. In 
addition, the hiding capacity and SNR/PRD of the proposed method outperform those 
of existing techniques. Due to the processing time of encoding/decoding is short, it is 
suitable for our method to implement in the real-time applications, or to be performed 
in a (mobile) health care device for ECG signal measurements. 
References 
1. Phadikar, A.: Data Hiding Techniques and Applications Speci?c Designs. LAP LAMBERT 
Academic Publishing, Saarbrucken (2012) 
2. Eielinska, E., Mazurczyk, W., Szczypiorski, K.: Trends in steganography. Commun. ACM 
57, 86–95 (2014) 
3. Yang, C.Y., Wang, W.F.: High-capacity steganographic method for color images using 
progressive pixel-alignment. J. Inf. Hiding Multimed. Signal Process. 6, 815–823 (2015) 
4. Li, B., Wang, M., Li, X., Tan, S., Huang, J.: A strategy of clustering modi?cation directions 
in spatial image steganography. IEEE Trans. Inf. Forensics Secur. 10, 1905–1917 (2015) 
5. Hsiao, C.Y., Tsai, M.F., Yang, C.Y.: High-capacity robust watermarking approach for 
protecting ownership right. In: The 12th International Conference on Intelligent Information 
Hiding and Multimedia Signal Processing, November 21–23, Kaohsiung, Taiwan (2016) 
6. Liu, S., Pan, Z., Song, H.: Digital image watermarking method based on DCT and fractal 
encoding. IET Image Process. 11, 815–821 (2017) 
7. Ibaida, A., Khalil, I.: Wavelet-based ECG steganography for protecting patient con?dential 
information in point-of-care systems. IEEE Trans. Biomed. Eng. 60, 3322–3330 (2013) 
8. Chen, S.T., Guo, Y.J., Huang, H.N., Kung, W.M., Tseng, K.K., Tu, S.Y.: Hiding patients 
con?dential data in the ECG signal via a transform-domain quantization scheme. J. Med. 
Syst. 38 (2014). doi: 10.1007/s10916-014-0054-9 
9. Yang, C.Y., Wang, W.F.: Effective electrocardiogram steganography based on coef?cient 
alignment. J. Med. Syst. 40 (2016). doi: 10.1007/s10916-015-0426-9 
10. Shiu, H.J., Lin, B.S., Huang, C.H., Chiang, P.Y., Lei, C.L.: Preserving privacy of online 
digital physiological signals using blind and reversible steganography. Comput. Methods 
Programs Biomed. 151, 159–170 (2017) 
11. Chen, W.H., Smith, C.H., Fralick, S.C.: A fast computational algorithm for the discrete 
cosine transform. IEEE Trans. Commun. COM-25, 1004–1009 (1977) 
12. Feig, E., Winograd, S.: Fast algorithm for the discrete cosine transform. IEEE Trans. Signal 
Process. 40, 2174–2193 (1992) 
13. Liang, J., Tran, T.D.: Fast multiplierless approximations of the DCT with the lifting scheme. 
IEEE Trans. Signal Process. 49, 3032–3044 (2001) 
14. Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. 
Med. Biol. Mag. 20, 45–50 (2001) 
648 C.-Y. Yang et al.
Semantic-Based Resume Screening System 
Yu Hou(?) 
and Lixin Tao 
Pace University, New York City, NY 10038, USA 
{yh50276p,ltao}@pace.edu 
Abstract. At present, XML becomes one of the best choices for storing semi-structured 
electronic resumes. Most of the companies let the candidates ?ll out 
their resumes online on the company’s website and store these electronic resumes 
uniformly. This paper assumes that all candidates’ electronic resumes will be 
saved in the form of XML, and proposed a Semantic-based Resume Screening 
System (RSS). The RSS could improve the accuracy and e?ciency in the hiring 
process by using the Ontology Knowledge Base and the Pace XML Validator. 
Keywords: Knowledge representation · Web Ontology Language (OWL) · XML 
Integrated syntax and semantic validation 
1 Introduction 
1.1 A Subsection Sample 
Due to the low coverage, poor e?ciency and high cost, the traditional o?ine recruitment 
mode has been replaced by the internet recruitment mode since the last few decades. 
The top companies may receive a large number of electronic resumes daily. Therefore, 
it is challenging for recruiters to store and screen the resumes which are semi-structured. 
Nowadays, the most popular model is applicants ?ll out their resumes online on the 
company’s website, which facilitates the uniform store and management of electronic 
resumes. Since XML has appeared, it becomes the best choice for storing electronic 
resumes. At present, most of the companies are challenged by screening those semi-structured 
resumes. It is a heavy work to screen the ideal candidate accurately and 
e?ciently from a large number of resumes. Manual screening is not only time-consuming, 
but also has a strong subjectivity. It is di?cult to be guaranteed that the 
companies can ?nd the ideal candidates from the large-scale resume objectively and 
e?ciently. 
The traditional and the most common solution is keyword search, for example, if the 
HR want to search candidates who graduate from Pace University, then he or she needs 
to use ‘Pace University’ as the keyword to search in candidates’ resumes. However, this 
method cannot meet the most HRs’ requirements very well, because some companies 
HR usually use keyword tags for expressing their certain demands, such as ‘candidate 
who has work experience in the Fortune Global 500 companies’, ‘candidate who is 
graduated from the Lvy League’. The traditional keyword search just can screen the 
resumes which include the speci?c name such as ‘Google’, ‘Facebook’, ‘Pace 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 649–658, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_49
University’ and so on, but it cannot screen resumes by using ‘Fortune Global 500 
companies’. This paper proposed a Semantic-based Resume Screening System (RSS), 
which is introduced an Ontology Knowledge Base. The RSS could improve the prescre- 
ening of the hiring process by matching the Ontology Knowledge Base. 
Since 2013 Pace University has developed Pace XML Validator [1], this validator 
greatly improves the e?ciency of the XML ?le’s veri?cation with the features of reus- 
able, integrated syntax and semantic validation. Because of the advantages of XML in 
the storage and retrieval of electronic resumes, this paper assumes that the electronic 
resumes need to be ?ltered that is ?lled out by the candidates on the companies’ website 
and stored as XML documents, and using the Pace XML Validator to set the constraints 
to represent the screening requirements. Therefore, the validated XML documents meet 
the screening requirements. Conversely, an XML document that failed the validation 
means it does not meet the screening requirements. 
This paper o?ers a proposed approach to help the RSS understand the user’s real 
intention more accurately. The HR can achieve the ideal candidates from the optimized 
result. The main contribution of the RSS is to enhance the accuracy and e?ciency of 
the electronic resumes screening process. In the following of this paper, the related work 
of RSS system will be introduced in Sect. 2, and the approach on how to create a knowl- 
edge base will be discussed in Sect. 3. The details of the system framework and imple- 
mentation will be illustrated in Sect. 4. Finally, we will make a conclusion of this project. 
2 Related Work 
2.1 Ontology-Based Knowledge Representation 
Knowledge Representation is the ?eld of study concerned with using formal symbols 
to represent a collection of propositions believed by some putative agent [2]. In a general 
sense, knowledge representation is a set of conventions for describing the world and it 
is the symbolization, formalization, or modeling of knowledge. From the perspective of 
computer science, knowledge representation is a general method to study the feasibility 
and validity of computer to represent knowledge. It is a strategy of representing human 
knowledge as a data structure and a system control structure of machine processing. 
More speci?cally, the knowledge can be de?ned as understanding, facts, information 
and description for some real or imaginary entity. In other words, in the ?eld of computer 
science, Knowledge Representation means that let the machines can understand. At 
present, the research on knowledge representation and organization method is mainly 
composed of frame expression, generative expression, object-oriented expression and 
ontology-based expression. Ontology-based knowledge representation is getting more 
and more attention. The concept of ontology originated in the ?eld of philosophy, which 
is de?ned as ‘a systematic description of the objective existence in the world’ is a 
systematic explanation or explanation of objective existence and concerns the abstract 
essence of objective reality [3]. With the development of arti?cial intelligence, Knowl- 
edge Representation was given a new de?nition in the ?elds of AI and computer science. 
Ontology is an integration tool for application and domain knowledge. It is a collection 
of concepts in a certain domain and relations among concepts, and the relationship 
650 Y. Hou and L. Tao
re?ects the constraints and connections among concepts. Ontology-based knowledge 
representation can ensure the consistency and uniqueness of knowledge sharing in the 
process of sharing, and can fully express the complex semantic relations between 
knowledge. Therefore, ontology can solve a large number of knowledge exchange and 
disordered sharing situation to maximize the sharing and reuse of knowledge. The use 
of ontology formal knowledge representation can easily access knowledge semantic 
information. Speci?cally, ontologies emphasize the relationships between entities and 
express. It can re?ect these relationships through a variety of knowledge representation 
elements. These elements are also called meta toms that includes concept, attributes, 
relations, functions, axioms and instance. Therefore, ontology has been widely used in 
many ?elds. 
2.2 Pace Schematron XML Validator 
Extensible Markup Language (XML) is a markup language that de?nes a set of rules 
for encoding documents in a format that is both human-readable and machine-readable. 
The XML can be used to mark data, de?ne data types, and it is a source language that 
allows users to de?ne their own markup language. The main features of the XML are: 
(1) Convenient extensibility. XML allows organizations or individuals to create a 
collection of tags that suit their own needs, and these collections of tags can quickly get 
used to the Internet. (2) Strong structure. The logical structure of XML document data 
is a tree-like hierarchy. Each element in a document can be mapped to an object, and 
corresponding attributes and methods are also available. Therefore, it is suitable for the 
use of object-oriented programming to develop applications that process these XML 
documents. (3) Good interaction. When users interact with applications, using XML 
makes it easy to locally sort, ?lter, and perform other data operations without interacting 
with the server which relieves the burden on the server. (4) Powerful Semantic. In XML 
documents, people can use certain tags to de?ne the relevant semantics for data, which 
not only greatly improves the readability of the document for human beings, but is also 
easy to be read and used by machines. Therefore, the information exchange between 
di?erent devices and di?erent systems can be easy. Because XML describes the meaning 
of data content by tagging it and separates the display format of the data, the search for 
XML document data can be performed simply and e?ciently. In this case, the search 
engine does not need to go through the entire document, but only to ?nd the contents of 
the speci?ed tag on it. In this way, it is no longer di?cult to browse the Internet, as each 
page is displayed exactly what the viewer wants. In the electronic resume, for di?erent 
candidates, some speci?c markers are ?xed, such as name, age, graduation school, work 
experience, etc., but only the content is speci?ed by these marks. Therefore, combined 
with the characteristics of XML, the storage of electronic resumes in the form of XML 
documents has become the most e?ective method. 
Since 2013, Pace University developed an integrated syntax/semantic validator 
which is a Pace XML Validator. Schematron [4] is a popular rule-based XML dialect 
that allows us to specify such co-constraints for a class of XML documents and then use 
a standard Schematron validator to validate the co-constraints without coding. Over the 
past decade, the standard implementation of the Schematron validator is to use a standard 
Semantic-Based Resume Screening System 651
XSLT stylesheet [5, 6] to transform a Schematron document into a new validator XSLT 
stylesheet, and then use the latter to validate the XML instance documents. However, 
the current industry practice of XSLT-based Schematron validation may produce invalid 
results and cannot be easily integrated with other system components [1]. Thus, Pace 
University designed and implemented a validator as a reusable software component 
based on DOM Level 3 XPath. It supports all key features of Schematron ISO [4] 
including abstract rules and abstract patterns, network integration through web services, 
and event-driven loose-coupling. 
3 Create Knowledge Base 
Ontologies are usually organized in taxonomies and typically contain modeling primi- 
tives such as classes, relations, functions, axioms and instances [7]. Therefore, the 
ontology design of knowledge base is the design of concept, relationship and instance. 
This paper will illustrate the design of using the domain knowledge base to analyze the 
resume information, which could help the users to ?nd the ideal candidates more accu- 
rately. At present, the design of the ontology for semantic analysis of resume information 
is mainly composed of classes and instances. The classes in ontology have two functions: 
(1) Describe the meaning of class and the knowledge contained in the class; (2) De?ne 
subclasses and instances of the class. The di?erence between an instance and a class is 
that the class could be a name or some attributes that describe an instance within a 
collection, but the instance is a member of the collection. For example, the smartphone 
is a class, and the iPhone 8 is an instance of this class. By matching the domain knowl- 
edge base, the system can set the constraints in Pace XML Validator more accurately, 
so that the system can achieve the better result to the users. This paper uses Protégé as 
an ontology modeling tool to create a knowledge base. Protégé is a free, open source 
ontology editor and a knowledge management system [8]. Protégé provides a set of 
behavior-oriented systems based on a knowledge model structure to support the 
ontology construction of various expressions (such as an OWL, RDF, Dublin Core and 
so on). In the Protégé editor, the ontology structure is shown in the hierarchical directory 
structure. It is straightforward for the maintenance operations of the ontology (such as 
adding classes, subclasses, attributes, instances). Therefore, there is no need to concern 
the speci?c ontology language; it only needs to design a domain ontology model at the 
conceptual level. The example used in this paper is that an HR needs to ?nd the candi- 
dates that ‘graduated from Lvy League’, or ‘has work experience in Fortune Global 500 
companies’. A knowledge base will be designed based on this assumption. First, the 
‘Lvy League’ and ‘Fortune Global 500 companies’ are derived from the class Thing. 
The university such as ‘Havard University’, ‘Columbia University’ and so on, they 
belong to the class of ‘Lvy League’, and the company such as ‘IBM’, ‘Apple’, ‘Micro- 
soft’ and so on, they are the instances which belong to the class ‘Fortune Global 500 
companies’. By establishing the knowledge base in this ?eld, the system will understand 
how to set the constraints in Pace XML Validator when it meets the requirements such 
as ‘having a work experience in Fortune Global 500 companies’. Therefore, the system’s 
ability will be enhanced. 
652 Y. Hou and L. Tao
4 Design of Semantic-Based Resume Screening System Framework 
The Semantic-based Resume Screening System (RSS) is composed of four parts: No. 1: 
Reading the requirements from the users, and the RSS will conduct a preprocessing for 
later operation. No. 2: Based on the pre-processed input, the system will match the 
knowledge base created previously, then generate the contents from the resumes that 
RSS will screen later. No. 3: Based on the contents, the RSS will generate constraints 
automatically when the RSS invokes the Pace XML Validator. That is, the RSS will 
generate a Schematron ?le (.sch ?le). No. 4: The RSS will invoke the Pace XML Vali- 
dator to validate each resume in the resume folder, then return the veri?ed documents 
that the users want to achieve. Figure 1 is the design of Semantic-based Resume 
Screening System Framework. 
Fig. 1. The design of semantic-based resume screening system framework. 
4.1 Preprocessing 
When a user enters a requirement, we need to preprocess the input ?rst so that the later 
operation can be more convenient for these requests. The main preprocessing is to ignore 
the capitalization of the letter input, as well as the space input. As we know, when users 
enter the requirements, the ?rst letter of a university or organization always needs capi- 
talization in the expression. However, the expression of some classes and instances in 
the knowledge base may not be stored in the form of capital letters. In order to avoid 
errors caused by inconsistencies during the operation process, the RSS will ignore the 
capitalization of the letter input in the preprocessing part. In the OWL ?le, the spaces 
are often saved with the ‘symbol’. Figure 2 is an example of an OWL ?le. From this 
example, we can see that the spaces in the class ‘fortune global 500 companies’ and the 
class ‘lvy league’ are represented by the ‘symbol’. Therefore, the RSS will preprocess 
the spaces, in order to avoid the error during matching the knowledge base. 
Semantic-Based Resume Screening System 653
Fig. 2. An example of OWL ?le. 
4.2 Matching the Knowledge Base 
In this section, the RSS will use Jena to read and identify the established knowledge 
base, which is to use Jena to read and analyze the saved OWL ?le. Apache Jena (or Jena 
in short) is a free and open source Java framework for building semantic web and Linked 
Data applications. The framework is composed of di?erent APIs interacting together to 
process RDF data [9]. First, the RSS will match the preprocessed input with the OWL 
?le read by Jena, thus, the RSS will understand whether the user needs the knowledge 
base’s assist. For example, if the user wants to ?nd out candidates who have experience 
working with Fortune Global 500 companies, the RSS can know that ‘Fortune Global 
500 companies’ means that candidates should have work experience in companies such 
as IBM, Apple, Microsoft and so on because the instances ‘IBM, Apple, Microsoft’ 
belong to the class ‘fortune global 50 companies’ in the knowledge base. If a user’s 
requirement does not need the knowledge base’s assist, for example, a user wants to 
?nd candidates who graduated from Pace University, the RSS may ?nd that ‘Pace 
University’ is not one of the classes in the OWL ?le. Then, the RSS will return the result 
“Pace University” directly without the knowledge base. 
654 Y. Hou and L. Tao
4.3 Generate Schematron File 
Through the previous section, the RSS will understand the details of users’ demand to 
search. Next, the RSS will generate a Schematron ?le based on the keywords returned 
in the previous session automatically. The Schematron ?le is to set the XML ?le’s 
constraints. The Pace XML Validator will use the Schematron ?le to verify whether the 
XML ?le can meet the constraints. For example, a candidate named Mike, his resume 
is saved in XML format. Figure 3 is Mike’s resume saved in an XML ?le. If an HR 
wants to ?nd candidates who graduated from Pace University, then the keyword gener- 
ated in Sect. 2 is ‘Pace University’. And the RSS will generate the corresponding Sche- 
matron ?le based on the keyword to set constraints on the XML ?le. Figure 4 is a Sche- 
matron ?le generated from the keyword ‘Pace University’. In this ?le, we restrict the 
XML ?le as follows: Search the content ‘Pace University’ under the ‘education’ element, 
if the ‘education’ element has the content – ‘Pace University’, it means the veri?cation 
is passed, otherwise, it will fail. When an HR wants to ?nd candidates who have working 
experience in the Fortune Global 500 companies, the RSS will understand that ‘work in 
Fortune Global 500 companies’ means ‘working in the companies such as IBM, Apple, 
Microsoft and so on’ via matching the knowledge base. Thus, the keywords are the 
companies’ name such as ‘IBM, Apple, Microsoft and so on’. Then, the RSS will 
generate the corresponding Schematron ?le based on these keywords. Figure 5 is a 
Schematron ?le with the constraints of ‘work in Fortune Global 500 companies’. In this 
?le, we restrict the XML ?le as follow: Search the content which includes any of the 
companies’ name which is one of the Fortune Global 500 companies under the ‘work’ 
element, if the ‘work’ element has any of these names, it means the veri?cation passed, 
otherwise, it will fail. 
Fig. 3. Mike’s resume. 
Semantic-Based Resume Screening System 655
Fig. 4. The Schematron ?le with the keyword ‘Pace University’. 
Fig. 5. The Schematron ?le of ‘work in Fortune Global 500 companies’. 
4.4 Invoke Pace XML Validator 
In this paper, we assume that all candidates’ electronic resumes are saved as XML ?les 
in a speci?c folder. In this step, the RSS will invoke the Pace XML Validator and follow 
the Schematron ?le which is generated in the previous step to verify the XML ?les 
individually. Once the validation has completed, the RSS will return the veri?ed XML 
?le. Figure 6 shows the three resumes saved in one folder; if a user wants to ?nd a 
candidate who graduated from Pace University. After the screening, the RSS will return 
the veri?ed XML ?les in ‘Alice.xml’ and ‘Tom.xml’. Figure 7 is the results of the 
screening. If a user wants to ?nd a candidate who has work experience in Fortune Global 
500 companies. After the screening, the RSS will return the veri?ed XML ?les in 
‘Mike.xml’ and ‘Tom.xml’. Figure 8 is the results of the screening. Now, the user can 
?nd their ideal candidates for the screening. 
656 Y. Hou and L. Tao
Fig. 6. The example of resumes. 
Fig. 7. The results of the RSS screen ‘Pace University’. 
Fig. 8. The results of the RSS screen ‘Fortune Global 500 Companies’. 
5 Conclusion 
In this paper, we showed that the search based on keywords cannot satisfy the current 
screening of electronic resumes. This paper proposed a Semantic-based Resume 
Screening System (RSS). This system can greatly enhance the understanding ability of 
the screening requirement based on the knowledge base. And this paper also improves 
the e?ciency of XML document validation through the application of Pace XML Vali- 
dator. The approach proposed in this paper can improve the e?ciency and accuracy of 
screening resumes. Therefore, by using this approach the e?ciency of a company’s 
hiring process will be highly promoted. In the future, our work will introduce the knowl- 
edge graph to improve the capability of the knowledge representation. Because the 
ontology primarily only supports the subclass Of (is-a or inheritance) relation. Various 
other relations, such as part-of are essential for representing information in various ?elds 
including all engineering disciplines [10]. 
Semantic-Based Resume Screening System 657
References 
1. Tao, L., Golikov, S.: Integrated syntax and semantic validation for services computing. In: 
2013 IEEE 10th International Conference on Services Computing (2013) 
2. Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Morgan 
Kaufmann, San Francisco (2004) 
3. Wu, J.: The construction of ontology-based domain knowledge base. Sci. Technol. Innov. 
Herald 30, 250–252 (2010) 
4. I. Standard: Information technology - Document Schema De?nition Language (DSDL) - Part 
3: Rule-based validation – Schematron, March 2013. http://standards.iso.org/ittf/ 
PubliclyAvailableStandards 
5. Dodds, L.: Schematron; validating XML using XSLT, March 2013. http://www.ldodds.com/ 
papers/schematron_xsltuk.html 
6. Jelli?e, R.: Schematron Implementations, March 2013. http://www.schematron.com/ 
links.htm 
7. Gruber, T.R.: A translation approach to portable ontology speci?cations. Knowl. Acquis. 5, 
199–220 (1993) 
8. Musen, M.A.: The Protégé Project: a look back and a look forward. AI Matters 1(4), 4–12 
(2015) 
9. Jena, A.: Getting started with Apache Jena. https://jena.apache.org/getting_started/ 
index.html 
10. Patel, K., Dube, I., Tao, L., Jiang, N.: Extending OWL to support custom relations. In: 2015 
IEEE 2nd International Conference on Cyber Security and Cloud Computing, New York, 
USA, November 2015 
658 Y. Hou and L. Tao
The Next Generation of Arti?cial Intelligence: 
Synthesizable AI 
Supratik Mukhopadhyay1(?) , S. S. Iyengar2 , Asad M. Madni3 , and Robert Di Biano4 
1 
Division of Computer Science and Engineering, Louisiana State University, 
Baton Rouge, LA 70803, USA 
supratik@csc.lsu.edu 
2 
School of Computing and Information Sciences, Florida International University, 
Miami, FL 33199, USA 
iyengar@cs.fiu.edu 
3 
Department of Electrical and Computer Engineering, University of California, 
Los Angeles, CA 90095, USA 
ammadni@ee.ucla.edu 
4 
Department of Computer Science, Louisiana State University, 
Baton Rouge, LA 70803, USA 
Abstract. While AI is expanding to many systems and services from search 
engines to online retail, a revolution is needed, to produce rapid, reliable “AI 
everywhere” applications by “continuous, cross-domain learning”. We introduce 
Synthesizable Arti?cial Intelligence, and discuss its uniqueness by its ?ve 
advanced “abilities”; (1) continuous learning after training by “connecting the 
dots”; (2) measuring quality of success; (3) correcting concept drift; (4) “self-correcting” 
for new paradigms; and (5) retroactively applying new learning for 
development of “long-term self-learning”. SAI can retroactively apply new 
concepts to old examples, “self-learning” in a new way by considering recent 
experiences similar to the human experience. We demonstrate its current and 
future applications in transferring seamlessly from one domain to another, and 
show its use in commercial applications, including engine sound analysis, 
providing real-time indications of potential engine failure. 
Keywords: Arti?cial intelligence · Synthesizable Arti?cial Intelligence 
IBM Watson · Natural language processing · Self-learning 
1 Introduction 
IBM’s Watson Analytics is no longer just a Jeopardy playing genius. Watson has 
embarked on a journey of knowing, going far beyond its initial capacity for Jeopardy 
question answering. Watson Analytics has made great strides employing the use of the 
The authors acknowledge the sponsorship of NASA Ames Research Center, US Department 
of Agriculture, National Science Foundation, US Department of Defense and the US Army 
Research Lab in their research. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 659–677, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_50
Natural Language Processing User Interface (NLP-UI) as a novel approach to analysis 
of business problems allowing even unseasoned businessmen an opportunity to analyze 
industry and personal datasets. 
The diversity of challenges in AI and their speci?c embedded complexities should 
not obscure the fact that the heart of the subject belongs to real-time reasoning. For the 
last decade, researchers in Arti?cial Intelligence (AI) have made exponential progress 
in applications across broad industry areas. Autonomous vehicles from Google and 
others have registered countless miles on American roads. AI systems are interpreting 
radiology images and diagnosing diseases with the same skill level as experienced radi- 
ologists and doctors. AI is in?uencing every aspect of human life from hearing aids to 
stock trades. So, is AI ready for primetime, or are we already there? We think the state-of-
the-art in AI today is at the same stage that software engineering was in the early 
1960s. During that time, software could only handle small problems in diverse domains 
(e.g., numerical analysis, personnel management, etc.): there was no way in which 
complex software systems involving millions or billions of lines of code could be created 
to tackle real world problems. In the same way, today’s AI systems are limited to solving 
smaller (but harder) problems like “image recognition”, and “automatic question 
answering”. Scaling such systems to address large complex tasks such as automated 
drug design, air tra?c control, or running an entire enterprise remains a challenge. Soft- 
ware engineers invented abstractions embodied in object-oriented techniques and prin- 
ciples of software reuse to revolutionize productivity; today large software systems are 
no longer developed from scratch: they are built by reusing existing code through 
subclassing and overriding methods. A variety of software abstractions are available 
today to enable code reuse, from design patterns to frameworks. Thanks to this meth- 
odology, today software is all encompassing, in?uencing every walk of human life from 
power systems to retail. Is AI waiting for a similar abstraction revolution? 
While AI has been part of many systems and services from search engines to online 
retail, to realize the vision of “AI everywhere”, a revolution similar to that which 
occurred in software is needed. Despite all the recent successes of AI, many questions 
remain unanswered. 
In many ways, Watson represents a solution to many problems, yet still has some 
limitations in moving to a new domain. Watson cannot hit the ground running in a 
completely new domain, automatically deploying and recon?guring itself online when 
situations change. The machine learning system of Watson is very good, but cannot 
auto-tune to a problem domain instantaneously. The concept of domain changes in many 
of these applications is still a problem of interest. 
Researchers throughout the AI community have been asking, “How do you improve 
productivity in the creation and deployment of AI systems?” In other words, how can 
we produce AI systems rapidly and reliably as the applications of AI expand from 
understanding speci?c scenes to serving societal and business needs in critical areas? 
The authors and their team have introduced an alternative approach through Synthe- 
sizable Arti?cial Intelligence, or SAI technology. Previous work by Mukhopadhyay, 
Iyengar et al., on Cognitive Information Processing Shell [1] served as an impetus for 
this approach. 
660 S. Mukhopadhyay et al.
SAI is unique from any other AI system by virtue of its ?ve technological advances 
or “abilities”; (1) continuous learning after training by “connecting the dots”; (2) meas- 
uring the quality of success; (3) correcting concept drift; (4) “self-correcting” for new 
paradigms; and (5) retroactively applying new learning for development of “long-term 
self-learning.” 
SAI can retroactively apply new concepts to old examples, “self-learning” in a new 
way by considering recent experiences similar to the human experience. In this paper, 
we demonstrate how our work on SAI has overcome limitations of other AI systems, 
and its current and future applications in transferring seamlessly from one domain to 
another. We show its use in current commercial applications, including engine sound 
analysis, where it provides real-time indications of potential engine failure, and its future 
uses in “automatic drug discovery”. 
2 Hierarchical Fractal Architecture of SAI Agents and Related 
Work 
Currently, di?erent AI systems specialize in single speci?c tasks, determined by the data 
type in which they were trained in advance. SAI is unique in measuring the applicability 
of a given agent (neural network), or cluster of neurons within a network, to a speci?c 
task in real time. Thus, SAI can detect if the input changes to something the network is 
not equipped to deal with and draw from a wide variety of related and unrelated data to 
activate di?erent neural clusters that can be used to rapidly understand the new input. 
Adapting to new types of input during execution time is a di?cult problem. Obvi- 
ously, there cannot be true learning to predict labels without at least a few ground truth 
labels to check against. That said, unsupervised methods like self-organizing map or 
auto encoders together with clustering can work in some situations. Unfortunately, these 
unsupervised methods require a lot of data, so they cannot be used to adapt rapidly to a 
new circumstance in real time. By learning data distribution without labels, or automat- 
ically organizing the data into clusters and assigning arbitrary labels, the data can be 
correctly understood with only a handful of additional ground truth examples. By e?ec- 
tively utilizing neural clusters trained on other problems, we solve this problem, enabling 
unsupervised learning that can also adapt to new circumstances immediately. 
As SAI learns new concepts from other problems, we can retroactively apply these 
concepts on old labeled data, allowing continual improvement as we gain better under- 
standing of old data. This is similar to the way humans perfect their skills in complex 
tasks. 
SAI determines the applicability of a given neural network, network layer, or feature 
map, to the analysis of a given input. When applied properly, neural networks internally 
organize input data into increasingly high level abstract information. By analyzing the 
response of a network segment to known and unknown information, we can develop a 
relation to determine whether the abstract concepts learned by a segment generalize well 
to a given piece of data. This allows us to immediately draw from a diverse ‘segment 
library’ of learned concepts when analyzing a new problem. Membership in the segment 
library is determined by maximum applicability to any problem, not applicability to the 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 661
current problem, so useful concepts are always retained. Humans can separate their 
processing based upon natural features by ?lling in the open parts through learned expe- 
rience, enabling them to transfer the information to new experiences (Fig. 1). 
Fig. 1. Feature response separation. 
SAI agents are organized into a series of progressively more task speci?c network 
layers, where each layer can be connected to multiple sub layers (Fig. 2). 
Fig. 2. Hierarchical fractal architecture. 
We refer to a layer plus all its possible sublayers as a lobe. Inputs such as raw sensory 
data will ?ow into our network. Low level lobes will apply to most or all problems and 
will start to process the data; at this point it will only be passed on to sub-lobes where 
it is most applicable. As data ?ows through the network, only sections that are capable 
of processing that type of data are activated, while non-relevant sections are bypassed. 
Eventually the fully processed and understood data is routed to a ?nal high-level lobe 
and produces a result. Di?erent lobes can be associated with di?erent data types, but 
di?erent high-level lobes attached to the same mid-level lobe can also be associated with 
di?erent tasks for the same data. This allows us to e?ciently use the same type of data 
662 S. Mukhopadhyay et al.
di?erently depending on the task, while still sharing maximum knowledge between 
tasks. When we reach a point where no sub-lobe of a given lobe is applicable to a given 
task, we ‘grow’ a new sub-lobe starting from that point, created from the most applicable 
available network segments from previous tasks. As we train or learn about our new 
task, these segments may diverge form their original values as the network improves at 
the task. If this happens, they are also added to our segment library for future use. By 
having a way to measure how well a network segment applies to a given input, we can 
instantly transfer knowledge learned from other problems to the current situation. The 
ability to transfer knowledge e?ectively between very di?erent tasks allows rapid adap- 
tation to new and unusual conditions. 
Conversely, when a given segment no longer applies to a given set of inputs, it can 
be saved to a knowledge library rather than overwritten, allowing it to be recalled if it 
becomes relevant again in the future. This gives us an e?ective method for lifelong self-learning, 
where potentially valuable concepts that are irrelevant to the current problem 
are unused, but not forgotten. 
SAI’s hierarchical structure allows us to e?ciently share knowledge between 
di?erent tasks, while retaining a segment library eliminating the problem of catastrophic 
forgetting. Our method allows valuable learned concepts to be identi?ed, and when they 
become irrelevant due to changes in conditions or concept drift, they can be bypassed, 
or phased out and saved for use in later tasks. 
Because our system can measure the applicability of network weights to a speci?c 
piece of data, we can swap out, or reroute around currently useless sections of our 
network rather than overwriting them. This eliminates the danger of catastrophic forget- 
ting. 
We believe that biological learning systems can adapt to new conditions so rapidly 
because of two mechanisms: associative memory and analogy. Associative memory is 
a mechanism by which training data can be e?ectively transferred between tasks that 
are only loosely related, by associating part of one task with another, and using the 
previously learned knowledge for the new task. Analogy is more complex and likely 
only humans are capable of it. Analogy involves forming an association between two 
relations involving the data instead of two data items. It is very powerful in that it allows 
us to infer complex relations from sparse data. 
When conditions change abruptly, a biological system will ?rst try to adapt to the 
new conditions by looking for similar conditions in the past in a di?erent context (e.g., 
if we’re trying to detect cars and one went through a shadow); in past trainings other 
objects have been in or gone through shadows, so we leverage the now very relevant 
shadow-resistant features from other trainings to correct the problem rapidly without 
needing to build a dataset of cars going through shadows), or by dropping part of the 
classi?cation criterion that have become unreliable in favor of a subset (e.g., blue paint 
was spilled on a cat, now our method shows the standard color based map responses as 
not applicable; but the shape based responses are still applicable, thus we swap out the 
now irrelevant color-based cat features from our system (archiving them in the segment 
library) and quickly learn to classify cats without relying on color). SAI integrates both 
methods. 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 663
SAI can be used to ‘grow’ new lobes on a nodal network agent when new useful 
features are discovered and determine which lobes to branch based on inclusion of 
features applicable to the current problem (Fig. 2). The issue of balance would apply to 
determining the optimal feature set to assign to each lobe. If a lobe became so large it 
was computationally infeasible to process data through it, it would be split into two 
smaller balanced lobes. Because our network can bypass non-applicable layers and their 
sub-layers, we avoid having to make such a hard tradeo? between knowledge acquisition 
and memory retention. 
Our SAI architecture can e?ciently treat the same data di?erently based on context 
and system goals; the same lower level lobe will be associated with di?erent higher-level 
lobes for di?erent ways of handling its output. These can be activated selectively 
based on system goals, or simultaneously to accomplish two tasks with only incremen- 
tally more processing power than is required for one. Similarly, several high-level lobes 
may be associated with di?erent versions of a drifting concept, or di?erent noise types. 
If the system goals are not explicitly given, the route the data takes through the 
network is determined primarily by lobe applicability, and output paths represent system 
goals. This means the system has the ability to choose its own system goals based on 
the situation if necessary. 
Changing architecture based on sensory input is a fundamental property of SAI in 
that data is routed only through lobes capable of processing it. 
As with all cognitive architectures, memory and computation are di?erent aspects 
of the same connections and weights. Sensory inputs are ?rst processed in general areas 
of the network, and then routed through dedicated areas based on the speci?c data type 
and target task. 
Instincts can be emulated by training a network segment to emulate a hard-coded 
rule and adding that rule to the segment library. That allows it to be swapped either 
manually or automatically where applicable and allows the system to learn to re?ne or 
ignore the instinctual rule where necessary. 
SAI has a library of network segments to draw on, and segments are stored by 
maximum value in any situation, not current value. Therefore, catastrophic forgetting 
cannot happen. 
SAI represents a new paradigm in machine learning, able to draw on diverse knowl- 
edge to adapt any new situation rapidly. 
3 Self-learning 
Typical AI systems start out at some initial conditions, then improve at their target task 
iteratively during training time, and reach some asymptotic maximum quality, then are 
frozen in that state and ?elded. A human expert however can continue to gain expertise 
at a task long after they are ?nished being trained by an expert. Even when a human is 
the best in the world at a given task, and no better expert exists, they can still continue 
to gain expertise on their own. How is this possible? 
Well, in one class of tasks the human can easily determine a success/failure labeling 
or quality measure accurately, and therefore generate their own labeled data after 
664 S. Mukhopadhyay et al.
deployment. They then use reinforcement learning to continuously improve at their task. 
Machine learning can already do this quite well, assuming a system can be trained to 
estimate the quality measure, so we will neglect this case. 
In another case the task we are trying to improve is a labeling task, so the system/ 
human can never really be sure it is improving at the task after deployment without the 
occasional ground truth. Even for human experts, something akin to concept drift is 
possible. Nevertheless, a human expert will gain a better and better understanding of the 
task via unlabeled training; and be able to correct any concept drift from a single 
example. Existing machine learning systems generally have the capability to correct for 
concept drift via unlabeled plus labeled examples, but only our SAI architecture provides 
a mechanism to detect the concept drift automatically, so it knows when to ask for more 
examples. 
If existing lobes become inapplicable to the current tasks, the system will grow new 
lobes from that level on that apply better to the current problem and use them instead. 
This is analogous to a human whose old way of doing things isn’t working anymore 
experiencing a paradigm shift. The system may still need some ground truth to get a 
handle on the new situation, but it would realize on its own that the old learning was 
failing and that the results were no longer reliable and could ask for labeled examples 
to regain its bearings. 
Our system also demonstrates self-learning in another way. The system can retro- 
actively apply new concepts to old examples, learning new ways to understand long 
known tasks in light of recent experiences. This means our system could continue to get 
better at a task long after labeled data on the task had stopped coming in by transferring 
useful concepts from other tasks. This sort of long term self-learning is one of the ways 
human experts gain the highest levels of expertise. 
Let’s look at two potential applications for this advanced technology. 
3.1 A Practical Example 
To illustrate the usefulness of SAI’s advanced architecture, consider a video recognition 
network for classifying clips from musicals. The network would be trained for several 
tasks related to classifying the clips, such as ?lling in the sound e?ects, recognizing 
famous actors, determining the genre, and determining whether the clip comes from the 
beginning, middle, or end of the story. This scenario is signi?cant for two reasons. 
Firstly, there are thousands of hours of such videos available, either already labeled or 
easily labeled automatically. Second, there is signi?cant interest in these types of appli- 
cations. Shazam does something similar with music. 
SAI would start out using segments from one or more of the tasks, and produce an 
input layer, some intermediate sub-layers, and three output lobes; one for each task. If 
the same types of features were useful for all three problems, the network wouldn’t split 
into these sub branches until shortly before the ?nal layer. 
To clarify, if the actor recognition task used very di?erent features (facial features) 
from the sound prediction task (visual cues, gestures, and body movements), the network 
would bifurcate somewhere in the middle. Regardless, the early layers would contain 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 665
features that applied to all three problems while the later layers would contain problem 
speci?c features. The feature library would contain both. 
This illustrates a commercial application of SAI, but its real strength lies in its 
potential use to track suspicious behavior. 
Suppose we are training a system to detect pickpockets (or terrorists) from watching 
a video feed. There are not thousands of hours of data available on this; and it is not 
publicly available and well labeled. We may have a few tens of examples of pickpockets 
on video if we are lucky. Classically, this would make the problem infeasible; a computer 
couldn’t solve the problem even though a human might be able to do it never having 
seen a single real example. Humans can transfer knowledge from millions of other more 
innocent interactions in their experience to understand what is happening. The human 
already knows that the hands are used to grasp objects and are of interest, the clothing 
has pockets in it, usually in the same areas, that someone suddenly changing direction 
might be signi?cant, etc. Similarly, SAI will look at the handful of ‘pickpocketing’ 
interactions and search its feature library for anything applicable. 
The sound prediction features will be attuned to looking for small hand gestures (to 
predict ?ngers snapping) and leg movements (to predict footstep sounds). The segment 
that predicts where along the timeline a clip came from does so by learning to estimate 
fatigue level from pose and timing di?erences. The actor recognition features would 
understand the meaning and signi?cance of faces and would share these types of features 
with the fatigue estimation portion, which could use them to look for sweat on faces. 
Some of these features (hand movements, stress level from pose) would have higher 
than baseline applicability towards pickpocketing detection, and we could immediately 
identify these and use them. So, when SAI has the initial path leading to our new ‘pick- 
pocketing’ output lobe, it would already understand a great deal about the meaning and 
context of the scenes before even training with the ‘pickpocketing’ samples (Fig. 3). 
Fig. 3. Pickpocket scenario demonstrating breakout of video features into characteristic lobes 
and storage in the main Segment Feature Library for rapid future learning (Photo Courtesy of 
Ili Simhi). 
The new series of lobes would share low level features with the existing network; 
and even the high-level features would be initialized from the most applicable members 
666 S. Mukhopadhyay et al.
of the feature library. At that point the network would proceed to learn from the ‘pick- 
pocketing’ samples, and if any features changed signi?cantly, the network would know 
it had learned a useful new concept, thus adding the concept to the segment library. 
New concepts learned this way are retroactively applied to old problems. In this case 
new concepts learned from pickpocketing detection could be checked for applicability 
to music classi?cation. This would allow our network to continue learning about a 
problem long after data on that problem has stopped coming in, and therefore enabling 
a better understanding of “old memories” in light of new experiences. 
3.2 Transfer Learning 
Due to transfer learning, analogical reasoning, and automated tuning, SAI can easily 
transfer from one domain to another, unlike some other AI systems which cannot be 
readily deployed to a new domain in order to learn from one another or from each other. 
In SAI, for example, an “agent” performing the task of understanding imagery from 
Synthetic Aperture Radars (SAR) can gain knowledge through transfer from an agent 
performing semantic segmentation of CAMVID imagery or from a VGG-16 model pre-trained 
on Imagenet [2]. A “core sample” from a previously trained agent on one task 
can be used to train a new agent for a di?erent task [2]. This strategy helps avoid the 
need to train an entire “network” on a large dataset and improve overall performance. 
For example, training a large VGG-16 network on a reasonably large dataset takes a 
long time; SAI avoids that by using a VGG-16 model pre-trained on the Imagenet dataset 
and extracting a “core sample” from it to create a new “agent”. Not only does this strategy 
save training time, but it also helps create a trained agent from a relatively smaller 
training dataset. In Watson, such an agent needs to be built from scratch by training on 
a large labeled (SAR) dataset. 
Another application using transfer of knowledge obtained through recognizing 
objects is in the ImageNet dataset, which has no characters to segment and classify 
handwritten foreign characters. Because ImageNet is drawn from a large and diverse 
dataset, its features can be assumed to be more general purpose, being able to represent 
many types of shapes and textures equally well. While it may not have the capacity to 
directly recognize foreign characters, it should have the ability to recognize many 
common simple structures in a wide array of image conditions, including noise. This is 
the knowledge that we want to transfer out of it and combine with our own knowledge 
of foreign characters. In general, assume that the SAI framework is asked to con?gure 
an AI engine for a task “T.” We have at our disposal AI engines (neural networks) for 
solving tasks T1, …, Tn. Some concepts learned in solving one or more of T1, …, Tn 
will be relevant to solving task T. Assume we are provided a labeled dataset D for 
training a neural network to solve task T. 
SAI ?rst starts with a randomly initialized neural network to solve T. For each 
network corresponding to Ti, and for each of its layers, SAI determines the applicability 
of the learned concepts towards the new task T. This is done through the evaluation of 
a transferability metric that provides a measure between 0 and 1. SAI sorts the corre- 
sponding layers of each of the networks corresponding to T1, …, Tn in terms of 
decreasing transferability measure. For each layer of the new network corresponding to 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 667
T, SAI transfers the top k “relevant” weights from T1, …, Tn. Finally, SAI partitions D 
into two subsets: a small subset Dtrain that will be used to ?ne tune the network and a 
testing set Dtest that is used to test it. Note that both Dtrain and Dtest are also used in 
computing the transferability metric. Notice that the data needed for ?ne tuning is only 
a small subset of D. This scheme works even if D is small. 
Today, large AI systems are developed and ?ne-tuned by companies with armies of 
highly paid data scientists and engineers. It takes a signi?cant amount of time, money, 
and e?ort, together with a deluge of training data, to build and train an AI system that 
can operate at the level of humans in a new domain.1 
Even with this enormous force, 
gaps in training remain (Fig. 4). 
Fig. 4. Gaps in current state-of-the-art intelligent systems. 
The AI community has recognized this limitation as one of the main stumbling blocks 
hindering progress and preventing AI from positively in?uencing important areas of 
human endeavor. The fast-changing nature of today’s world where M&As happen at the 
blink of an eye, new diseases appear at an alarming rate (e.g., Zika), political landscapes 
change overnight, and natural disasters come out of the blue make this slow mobility of 
AI across domains a formidable problem. For years, scientists have wrestled with a 
variety of solutions to this problem, such as “transfer learning.” 
Most AI systems today rely on “transfer learning” to bring the experience of an AI 
system in one domain to bear upon problems in another. This technique, however, 
ignores the tremendous amount of human experience already available in the new 
domain. Compare this to the way a person explores a new city. The person will combine 
previously acquired skills, such as map reading with the knowledge obtained from 
questioning locals about the best restaurants, museums, and shops, allowing them to 
navigate and enjoy the city even though the city is new, and the tourist may not speak 
1 
While deep learning techniques have eliminated the need for automatically extracting features, 
they have been shown not to work well, for example, for texture datasets where the inherent 
dimensionality of the data is high [2]. 
668 S. Mukhopadhyay et al.
the local language. This dynamic combination also enables a person to deal with 
unforeseen events such as road closures and detours. 
Humans have this innate ability to use this combination in their daily life to adapt to 
new situations and tasks. This fundamental recipe used by humans to survive in a rapidly 
evolving world is missing in current AI systems.2 
How then is it possible to rapidly 
synthesize AI systems, leveraging previous experience and existing knowledge in a new 
domain to hit the ground running? Solving this problem requires rethinking the funda- 
mentals of existing AI architectures, through development of loosely coupled elastic 
architectures that can interact with humans and other AI systems and draw upon their 
knowledge and skills gained from previous experience and collaboratively solve inter- 
disciplinary problems. 
3.3 Expanding the Reach of AI Through Synthesizable AI Using Peer Learning 
Figure 5 depicts a loosely coupled Synthesizable AI architecture. The top layer provides 
the reasoning, learning, and knowledge representation functionalities. It includes models 
that represent human background knowledge. Multiple generative models exist, such as 
Hidden Markov Models (HMM). In addition, SAI includes a transfer learning and an 
analogical reasoning framework, a deep neural network model (DNN), a statistical 
model (like statistical region merging), hypergraph-based models for large scale infer- 
ence together with heuristics to prune the search space, frameworks for active semi-supervised 
and online learning, and an automatically curated belief store (based on 
autoencoders) that manages beliefs of humans and AI systems. 
Fig. 5. Synthesizable AI architecture. 
2 
Some individual pieces of the puzzle are already developed in sub?elds of AI like active 
learning and transfer learning. 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 669
The middle layer allows deployment, recon?guration, and collaboration among AI 
systems solving diverse problems using an elastic peer-to-peer agent architecture that 
exploits the top layer and provides agility to it through dynamic agent synthesis and 
deployment based on declaratively speci?ed knowledge in near real time. That is, the 
agents will use the reasoning engines as well as rules learnt by learning engines to process 
information, learn from other agents or human expertise through transfer learning and 
analogical reasoning, provide classi?cation, and make decisions. It is this layer that 
allows meta-learning for handling dynamically available human expert knowledge and 
for dealing with concept drifts. It provides a single programming interface to synthesize 
agents. Furthermore, the same layer enforces hot deployment of these agents under 
operating condition by leveraging the third layer described below. The organization of 
the agents in this layer can be ?at (peer-to-peer) or hierarchical where agents in upper 
layers are built by composing those in lower layers and can perform higher level tasks. 
The third layer depicts a high-performance run-time execution middleware that 
enables automated agent deployment and redeployment in real time through persistent 
hot-swapping, provides runtime monitoring for the agents, interfaces with sensors and 
actuators, and provides a distributed key-value store for publishing and subscribing to 
information by agents and sensors. Agents in the second layer can tune the runtime 
execution environment for optimal performance. Figure 6 shows an example ?ow for 
the synthesizable AI architecture. The architecture creates and combines a feedback-based 
meta-learning paradigm which is continuously monitoring the performance and 
relevance of existing/emerging data sets. In case the data characteristic changes drasti- 
cally (e.g., in streaming video analytics, where the background changes from lighted to 
dark as day gives way to night), a continually evaluated metric may indicate that the 
performance of an agent has fallen below a threshold (for example, in the case of video 
analytics, an unacceptable number of track overlaps, jumps, and drifts). SAI would 
respond to this situation by dynamically replacing this agent by another more appropriate 
to the altered situation or by adapting the former by transferring knowledge to it from 
agent(s) already experienced in such situations. A measure of transfer is used to deter- 
mine which agent(s) the knowledge is transferred from in the latter case. 
Fig. 6. Synthesizable AI ?ow diagram. 
670 S. Mukhopadhyay et al.
The Synthesizable AI architecture provides a practical approach in combining a 
multi-agent-based architecture with machine reasoning and learning. It leverages 
distributed and dynamic multi-agent synthesis to provide the following key features: (a) 
Dynamically incorporating the contextual knowledge from experts into the learning 
system; (b) Selectively use multiple learners to adapt to situation changes (c) Enable a 
never-ending learning system to deal with concept drift, and (d) Enable transfer of 
knowledge between agents solving problems in di?erent domains. The integrated system 
provides near real time response to rapidly changing situations without quality degra- 
dation or disruption in service commitments. The architecture allows a marketplace of 
AI systems, that cooperate and learn from each other to solve interdisciplinary problems, 
to be rapidly created, deployed, and adapted (Fig. 6). 
4 Evaluation and Commercialization: A New Revolution 
for the Next Decade 
While SAI is still work in progress, it has been commercialized by AutoPredictive- 
Coding LLC., (http://autopredictivecoding.com) in the vertical of automated machine 
diagnostics. The resulting SpotCheck application [4] provided real time machine diag- 
nostics from emitted sounds, vibrations, and magnetic ?elds (Fig. 7). As deterioration 
of the machine lubricants, bearings, brushes, or other components occurs, very subtle 
changes also occur in the sounds and vibrations of the machine as it continues to operate. 
These sounds can be analyzed to estimate the oil quality, vacuum level, belt tension, 
bearing condition, and other elements, and provide real-time indications of potential 
internal failure. This analysis was used to drive systems longer, pushing them to their 
limits, while avoiding catastrophic failure and saving millions of dollars each year. 
Fig. 7. Using automated diagnostics to prolong the life of industrial machinery. 
4.1 Automated Machine Learning System Now Used by NASA 
For terrain recognition (Figs. 8 and 9) [5, 6], the advanced supercomputing division 
at NASA Ames has been working with Louisiana State University to blend deep 
learning techniques for use on existing neural networks to create a robust satellite 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 671
dataset analysis system. Using a massive survey database consisting of over 330,000 
scenes from across the United States, the system has been able to quickly train and 
learn relevant patterns. The average image tile is 6000 pixels wide and 7000 pixels 
deep, comprising approximately a 200 Mb file for each image. The entire data set 
consists of 65 TB covering a ground sample distance of one meter. By using the SAI 
technology and synthesized AI, the networks can then be trained one layer at a time 
across very large and noisy datasets to provide the necessary fidelity for automatic 
terrain recognition and terrain authentication. 
Fig. 8. Sample images from the SAT 4/SAT 6 dataset [3]. 
Fig. 9. Automated tree cover estimation. 
The technology has most recently been used for (Fig. 10), automatic yield prediction, 
and automatic infrastructure tuning [7, 8]. Through a collaboration with NASA Ames 
Research Center, SAI has recently been applied to determine tree cover areas and agri- 
cultural areas in California (Fig. 9). These activities will assist in monitoring potential 
plant disease areas in remote, inaccessible areas requiring USDA intervention. 
672 S. Mukhopadhyay et al.
Fig. 10. Automatic yield prediction. 
4.2 SAI: A Potential Solution for US Department of Agriculture Use in Yield 
Prediction 
Another application is the prediction of agricultural yields, based upon evaluation of 
complex datasets, provides an excellent foundation for evaluation of these large data 
sets and establishment of automatic yield prediction as depicted in Fig. 10. In the ?gure, 
colors more clearly de?ne the yield production based upon an original LANDSAT tile 
which has been analyzed for speci?c patterns, most likely to yield higher growth. 
Yet another emerging application is the use of synthesizable AI to analyze and auto- 
matically color images. This will have enormous application in a variety of areas, 
including undersea exploration, and deep space exploration, as well as analysis of remote 
area activities. Figure 11 depicts the application’s use in automatically coloring a black-and-
white terrain landscape through analysis of speci?c features and the system’s 
capacity to “self-learn” based upon slight variations of terrain texture. 
Fig. 11. Automatic terrain landscape coloring. 
The search-based program/agent generation facility has already been used for intel- 
ligent tutoring applications in high school math education [9, 10], automated drug 
discovery [11], and automated program visualization [12]. These applications will 
continue to expand into the future. 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 673
IBM’s Watson has been used commercially in IoT and the automotive industry, in 
social media campaigning, in medical diagnosis, in image interpretation in radiology, 
in natural language processing and speech recognition, in education, in ?nancial serv- 
ices, in supply chain management, and commerce. Recently, there have been applica- 
tions of Watson to automated material discovery. 
SAI has been used in a variety of domains including automated diagnostics for 
industrial machinery [4], satellite image understanding [5, 6], infrastructure tuning [7, 
8, 13], education [9, 10, 14], program execution visualization [12], noisy natural 
language processing [15], and automated drug discovery [11, 16] some of which have 
not yet been addressed by Watson. 
One of the more promising future applications of the synthesizable AI applications 
will be in development of automatic drug discovery, an area in which we were only now 
beginning to envision. SAI technology is currently competing for the AI XPrize with 
AI-based automated drug-discovery as its target. 
5 The Future: Automatic Drug Discovery 
In this age of vaccines and antibiotics there is still a constant e?ort to ?nd new drugs to 
combat illnesses for which there are no known cures. There is a need to discover 
replacements for existing drugs targeted at pathogens which have become resistant to 
current drugs. There is also a need to develop new drug therapies for health issues 
adversely a?ecting the lives of hundreds of million people every day. Indiscriminate use 
of antibiotics has resulted in pathogens developing drug resistance to produce “super- 
bugs” (http://www.cdc.gov/drugresistance). 
Although the multidrug-resistance in pathogens is growing fast, the number of new 
drugs being developed to treat bacterial infections has reached its lowest point since the 
beginning of the antibiotic era. The resistance is particularly problematic in Gram-posi- 
tive organisms S. aureus, E. faecalis, and S. pneumoniae as well as a number of Gram-negative 
organisms including K. pneumonia, A. baumannii, and P. aeruginosa. Hence, 
there is a dire need to develop new platforms and approaches to discover antibacterial 
agents against novel molecular targets. Not only are new drugs not being created, but 
the existing process of creating drugs is slow, ine?cient and costly. 
There is a desperate need to identify new antibiotics and antimicrobials rapidly as 
opposed to the normal time taken to create a drug. The solution is to develop a technique 
to construct libraries of molecules with the end goal of ?nding and developing new 
antibiotics and antimicrobial agents in a more e?cient and cost-e?ective manner. 
Our synthesizable AI-based approach (in collaboration with Dr. Brylinski from LSU 
Biochemistry) can automatically synthesize targeted drug molecules (see http:// 
brylinski.cct.lsu.edu/content/molecular-synthesis for the eSynth tool), ?lter candidates 
based on chemical criteria (such as being an antibiotic) [11], involves the analysis of 3D 
image models of the pathogen, automates clinical testing for side e?ects, and predict the 
candidate or candidates that is most likely to succeed. Our engine eSynth, generates 
target directed libraries using a limited set of building blocks and coupling rules 
mimicking active compounds. Given a set of initial molecules, eSynth synthesizes new 
674 S. Mukhopadhyay et al.
compounds to populate the pharmacologically relevant space. The building blocks [16] 
of eSynth are: 
Rigids: in?exible fragments often a single or fused aromatic group and 
Linkers: ?exible fragments connecting rigid blocks 
The eSynth software rapidly generates a series of compounds with diverse chemical 
sca?olds complying with Lipinski’s criteria for drug-likeness. Although, these mole- 
cules may have di?erent physicochemical properties, the initial fragments are procured 
from biologically active and synthetically feasible compounds. eSynth can successfully 
reconstruct chemically feasible molecules from molecular fragments. 
Figure 12 shows a 19-atom molecule compound rebuilt using eSynth. The process 
involves decomposition of the original 19-atom molecule through fragmentation and 
subsequent rebuilding to potentially more useful structures [16]. Furthermore, in a 
procedure mimicking the real application, where one expects to discover novel 
compounds based on a small set of already developed bioactives, eSynth can generate 
diverse collections of molecules with the desired activity pro?les. 
Fig. 12. A 19-atom molecule rebuilt using eSynth [9]. 
Research activity is ongoing in several new, emerging areas as outlined in the 
following paragraphs. 
5.1 Antibiotic/Drug Filter 
The goal is for eSynth to synthesize new compounds to populate the pharmacologically 
relevant space. We use Lipinski’s Rule-of-Five to ensure that the synthesized 
compounds have drug-like properties. Due to the number of possible combinations 
growing exponentially with the number of molecular fragments, Lipinski’s Rule-of-Five 
is applied to exclude those compounds that do not satisfy drug-like criteria. 
5.2 Side-E?ect/Toxicity Filter 
Even after pharmaceutical companies spend years and billions of dollars in creating a 
new drug, often it is the case that the drug has undesirable side-e?ects that renders it 
unusable. To detect side e?ects, the companies must conduct extensive clinical trials 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 675
that consume years of e?ort and billions of dollars. All that money and e?ort ultimately 
gets wasted if the drug has a negative side-e?ect in which case it is rejected by the FDA. 
5.3 Synthetic Accessibility Analysis 
Natural products are a source of ingredients for many drugs. Some of these natural 
products are hard to acquire. It is also di?cult to analyze the molecular structure of a 
compound for negative side-e?ects. We use deep neural network models that, from the 
molecular structure of a natural product, can predict it synthetic accessibility score. For 
compounds with high scores, it is possible to synthesize them using eSynth and analyze 
their side-e?ects. 
5.4 Automatic Drug Repurposing 
Based on features extracted by 3D image models of the pathogens and those of drugs, 
learning models will be used to repurpose existing drugs to new diseases. 
5.5 Other Future Applications 
Another application that SAI has been focusing on is automated vulnerability analysis. 
SAI has been automatically able to localize the “attack surface” of an application. 
Current research is focusing on automatically patching such vulnerabilities as well as 
extending the analysis to large cyber infrastructures. SAI is currently being targeted in 
the automatic lighting control domain in smart buildings. 
6 Limitations 
For continued expansion to synthesize new compounds in pharmacology, SAI and 
eSynth must be strengthened through the use of more expanded deep neural networks 
to determine side e?ects. We are currently evaluating and using deep neural network 
models to predict possible side e?ects from the molecular structure and the bondings in 
the drug molecule. 
7 Conclusion 
The future for arti?cial intelligence remains bright. Each day, new technologies such as 
the Synthesizable AI can be called upon to rapidly assume even “deeper roles” in inter- 
disciplinary areas ranging from open street maps, cybersecurity and power systems to 
kidney stone surgery through analysis of extreme and complex events and ever larger 
sets of mega-data and utilization of newer computing architectures [17, 18]. 
676 S. Mukhopadhyay et al.
References 
1. Iyengar, S., Mukhopadhyay, S., Steinmuller, C., Li, X.: Preventing future oil spills with 
software-based event detection. IEEE Comput. 43(8), 95–97 (2010). IEEE Computer Society, 
0018–9162/10 
2. Karki, M., DiBiano, R., Basu, S., Mukhopadhyay, S.: Core sampling framework for pixel 
classi?cation. In: Proceedings of the International Conference on Arti?cial Neural Networks 
(2017) 
3. Basu, S., Karki, M., Mukhopadhyay, S., Ganguly, S., Nemani, R., DiBiano, R., Gayaka, S.: 
A theoretical analysis of Deep Neural Networks for texture classi?cation. IJCNN 2016, 992– 
999 (2016) 
4. DiBiano, R., Mukhopadhyay, S.: Automated diagnostics for manufacturing machinery based 
on well regularized deep neural networks, integration. VLSI J. 58, 303–310 (2017) 
5. Basu, S., Ganguly, S., Nemani, R., Mukhopadhyay, S., Zhang, G., Milesi, C., et al.: A semi 
automated probabilistic framework for tree cover delineation from 1-M NAIP imagery using 
a high performance computing architecture. IEEE Trans. Geosci. Remote Sens. 53(10), 5690– 
5708 (2015) 
6. Basu, S., Ganguly, S., Mukhopadhyay, S., DiBiano, R., Karki, M., Nemani, R.: DeepSat—a 
learning framework for satellite imagery. In: Proceedings of the ACM SIGSPATIAL 2015 (2015) 
7. Sidhanta, S., Golab, W., Mukhopadhyay, S., Basu, S.: Adaptable SLA-aware consistency 
tuning for quorum-replicated data stores. IEEE Trans. Big Data 3, 248–261 (2017) 
8. Sidhanta, S., Mukhopadyay, S.: Infra: SLO aware elastic auto scaling in the cloud for cost 
reduction. In: IEEE BigData Congress, pp. 141–148 (2016) 
9. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of geometry proof 
problems. In: Proceedings of AAAI, pp. 245–252 (2014) 
10. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of solutions for shaded 
area geometry problems. In: Proceedings of FLAIRS (2017) 
11. Naderi, M., Alvin, C., Ding, Y., Mukhopadhyay, S., Brylinski, M.: A graph-based approach 
to construct target focused libraries for virtual screening. J. Chemoinform. 8, 14 (2016) 
12. Alvin, C., Peterson, B., Mukhopadhyay, S.: StaticGen: static generation of UML sequence 
diagrams. In: Proceedings of the International Conference on the Foundational Aspects of 
Software Engineering (2017) 
13. Mukhopadhyay, S., Iyengar, S.S.: System and architecture for robust management of 
resources in a wide-area network. US Patent Number 9,240,955 issued January 2016 
14. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of problems for shaded 
area geometry reasoning. In: Proceedings of AIED (2017) 
15. Basu, S., Karki, M., Ganguly, S., DiBiano, R., Mukhopadhyay, S., Gayaka, S., Kannan, R., 
Nemani, R.: Learning sparse feature representations using probabilistic quadtrees and deep 
belief nets. Neural Process. Lett. 1–13 (2016). https://doi.org/10.1007/s11063-016-9556-4 
16. Liu, T., Naderi, M., Alvin, C., Mukhopadhyay, S., Brylinski, M.: Break down in order to 
build up: decomposing small molecules for fragment-based drug design with eMolFrag. J. 
Chem. Inf. Model. 57, 627–631 (2017) 
17. Boyda, E., Basu, S., Ganguly, S., Michaelis, A., Mukhopadhyay, S., Nemani, R.: Deploying a 
quantum annealing processor to detect tree cover in aerial imagery of California. PLoS ONE (2017) 
18. Ganguly, S., Basu, S., Nemani, R., Mukhopadhyay, S., Michaelis, A., Votava, P., Milesi, C., 
Kumar, U.: Deep learning for very high resolution imagery classi?cation. In: Srivastava, A., 
Nemani, R., Steinhaeuser, K. (eds.) Large-Scale Machine Learning in the Earth Sciences. 
CRC Press, Boca Raton (2017) 
The Next Generation of Arti?cial Intelligence: Synthesizable AI 677
Cognitive Natural Language Search Using 
Calibrated Quantum Mesh 
Rucha Kulkarni, Harshad Kulkarni, Kalpesh Balar, and Praful Krishna(?) 
Arbot Solutions Inc., dba Coseer, San Francisco, CA 94105, USA 
praful@coseer.com 
Abstract. This paper describes the application of a search system for helping 
users ?nd the most relevant answers to their questions from a set of documents. 
The system is developed based on a new algorithm for Natural Language Under- 
standing (NLU) called Calibrated Quantum Mesh (CQM). CQM ?nds the right 
answers instead of documents. It also has the potential to resolve confusing and 
ambiguous cases by mimicking the way a human brain functions. The method 
has been evaluated on a set of queries provided by users. The relevant answers 
given by the Coseer search system have been judged by three human judges as 
well as compared to the answers given by a reliable answering system called 
AskCFPB. Coseer performed better in 57.0% of cases, and worse in 16.5% cases, 
while the results were comparable to AskCFPB in 26.6% of cases. The usefulness 
of a cognitive computing system over a Microsoft-powered key-word based 
search system is discussed. This is a small step toward enabling arti?cial intelli- 
gence to interact with users in a natural manner like in an intelligent chatbot. 
Keywords: Chatbot · Cognitive computing 
Natural Language Processing (NLP) · Cognitive search 
Natural Language Understanding (NLU) 
1 Introduction 
Natural Language Search and one of its prominent applications, Chatbots, are popular 
topics in the ?eld of technology as well as research. 
Their popularity can be attributed to the tremendous potential and promises in several 
?elds [1–6]. There are several areas of business, for example, brand-building, customer 
acquisition, product discovery, support, etc. that require human interaction. There is 
always high cost related to human labor, inaccuracy related to fatigue and general human 
biases and errors. An automation system based on Natural Language Search can remove 
several of these problems by simply replacing the human. 
A well-designed chatbot, for example, can be used to facilitate the internal processes 
of a business. A chatbot, if successfully developed as a subject matter expert, can be 
deployed to any part of the business so that any employee or customer can retrieve 
important information from it at any time. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 678–686, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_51
However, in the current state, a clear majority of systems based on NLU are not well 
designed or accurate enough. High accuracy is necessary so that business managers can 
entrust them with mission-critical roles and tasks. 
Highly advanced Arti?cial Intelligence (AI) technologies like deep learning have 
been tremendously successful in analyzing structured data [7–9]. However, when it 
comes to unstructured data, especially processing natural language like English, they 
seem to fail. For a technology like deep learning to be successful, it needs considerable 
amount of training data which might not be available to the enterprises. Moreover, such 
data must be annotated by subject matter experts, which can be prohibitively expensive. 
Most intelligent natural language systems like Chatbots fail because they are unable 
to interact and process content like human beings do. Frequently, they are based on 
keyword correlation which does not enable them to “understand” the relations between 
words and their context. 
Humans process information around certain ideas. Ideas are entities that are 
expressed by words and phrases and complex relationships between them, – something 
computer systems cannot trivially handle. Thus, they are unable to retrieve meaning 
from information. Some essential characteristics of human thought process are: focusing 
on ideas rather than words, prioritizing ideas based on signi?cance and credibility and 
knowing when there is not enough information available to take a decision. 
Intelligent machines capable of producing high accuracy can be designed based on 
the imperatives mentioned above without relying on keywords. They can extract ideas, 
order them, store them in a hierarchical data structure and even derive context from live 
conversations. This type of approach o?ers a signi?cant advantage over traditional 
chatbots in terms of capability and performance. 
This unique paradigm of intelligent understanding of information is captured in one 
branch of AI technology: cognitive computing [10–15]. Cognitive computing can be 
used to automate tedious, repetitive and language-driven work?ows that do not require 
human intelligence anymore. This would allow the humans to focus on creativity and 
judgment while the machines take care of the mundane jobs. 
In this work, we have developed a Natural Language Search system that can help 
users with their queries. It analyzes the query placed by the user and suggests relevant 
answers from a list of Frequently Asked Questions (FAQ). The reported answer may be 
a direct match with an existing entry in the FAQ or produce a solution that is part of 
some other entry. 
To evaluate the performance of the system, we used a human judge as well as 
compared the results with that of AskCFPB [16]. AskCFPB is a well-established and 
trustworthy resource to get answers maintained by the Consumer Finance Protection 
Bureau of United States Government. It covers a variety of topics including bank 
accounts, credit reports, credit scores, debt collection, student loans and mortgages. 
There is a search box on the website where the users can enter their queries and look at 
related questions and answers. This system is powered by popular Microsoft search 
engine – Bing. 
The rest of the paper describes the method, the evaluation criteria used, and the results 
of the evaluation. We close with discussion on future work already underway at Coseer. 
Cognitive Natural Language Search Using Calibrated Quantum Mesh 679
2 Methods 
2.1 Tactical Cognitive Computing 
All Coseer systems are built using Tactical Cognitive Computing (TCC). TCC is a 
programming paradigm with a focus on high accuracy, short training times and low cost. 
Tactical Cognitive Computing has been developed as a solution to traditional cognitive 
computing systems that are expensive and take years to implement. 
To be called tactical a cognitive computing system must be highly accurate. While 
lower level accuracy has been accepted and even lauded in the consumer world, the 
businesses need highly reliable systems. 
A TCC system must also be quick to train. The key factor in enabling a quick training 
time is a system’s ability to train without annotated training data. Annotation of training 
data typically needs subject matter experts that are very expensive. Annotation is also 
a time-intensive e?ort – some prominent implementations have taken years to train the 
data. 
Finally, a TCC system must be con?gurable, at low cost, to a wide variety of situa- 
tions in an enterprise. A key component of such con?gurability is the ability of tactical 
cognitive computing systems to be deployed over commoditized hardware in public 
cloud, private cloud or on-premise. 
Coseer’s implementation of TCC for natural language uses our work with Calibrated 
Quantum Mesh (CQM) and cognitive calibration, apart from various techniques in 
natural language processing, natural language understanding, and arti?cial intelligence. 
2.2 Calibrated Quantum Mesh 
Calibrated Quantum Mesh (CQM) is a novel AI algorithm that is speci?cally built for 
understanding natural language as human beings do. It does not need annotated training 
data and reduces the need for unannotated data to a fraction. 
CQM works on three basic principles, as shown in Fig. 1: 
Multiple Meanings. CQM recognizes that any symbol, word or text can have more 
than one meaning or quantum states with di?erent probabilities. It considers all these 
possible states to ?nd the most probable answer. 
Interconnectedness. CQM recognizes that everything is correlated to each other and 
modi?es each other’s behavior. Speci?cally, each item can in?uence the probability 
distribution across quantum states of all other items it is connected to. CQM considers 
such mesh of interconnections to reduce error. 
Calibration. CQM sequentially adds all available information to help converge the 
mesh into a single meaning. The calibration process is fast, accurate and e?cient in 
detecting any lacunas. The calibrations are implemented on training data, contextual 
data, reference data and other known facts about the problem. Sometimes these cali- 
brating systems called Calibrating Data Layers are handled by an independent CQM 
module or another AI process. 
680 R. Kulkarni et al.
Fig. 1. Basic tenets of Calibrated Quantum Mesh (CQM). 
Cognitive Natural Language Search Using Calibrated Quantum Mesh 681
When the training data is passed through CQM, it de?nes many of the mesh’s inter- 
relationships. Where applicable, data layer algorithms learn from such data. Often new 
relations and nodes are added to the mesh, making it smarter. 
When a work?ow is modeled by CQM, the creation of any black boxes is avoided 
to the maximum extent. This ensures transparency and interpretability of the models. 
We note that keywords are not important for CQM in processing natural language. 
Complex ideas are represented by di?erent parts of the mesh with varying complexity. 
This enables the algorithm to handle ?uid, multi-state and inter-connected knowledge 
– some inherent criteria of natural language. 
The algorithm can also learn from non-direct corpora. For example, while assisting 
a UK tax advisory, it was executed over HMRC.com, Law.com, Investopedia and a 
proprietary glossary. 
The most important advantage of CQM is that it does not need annotated training 
data. As a result, training a CQM model is very fast and cost-e?ective. It also allows 
iterations over the training process leading to highly accurate results. This capability 
quali?es CQM based systems to be part of TCC. 
2.3 Cognitive Natural Language Search System 
A cognitive search system can be applied to understand and interpret textual data in a 
natural way (Fig. 2). We used an algorithm based on CQM, which is also a TCC system, 
to develop a Natural Language Search system. We applied the Coseer system to assist 
users of AskCFPB with their questions. 
Fig. 2. Overview of the cognitive search system. 
The search system has two main steps: ingestion and search. In the ingestion step, 
documents are interpreted by the CQM and processed into relevant data structures. In 
this case, it was the FAQs that were processed and stored in a database. Then a search 
module takes the query as input and searches the database for the relevant text or a 
snippet. The relevant text is then sent to the user as a possible answer to the query. 
682 R. Kulkarni et al.
3 Evaluation Criteria 
The cognitive search system was evaluated in the following ways: 
Accuracy. This criterion measures how accurately the system answers the queries. It 
was calculated by dividing the number of queries correctly answered by the total number 
of queries. The search system was tested with 158 queries. For each query, the top three 
results returned by the system were evaluated by three human judges. The results were 
marked as relevant if any of the top three results satisfactorily answered the question. 
Comparative Performance. This evaluation criterion demonstrates how well the 
search system performs as compared to AskCFPB search. AskCFPB was selected for 
comparison because it is the most closely related search system. This system is powered 
by Bing Search Engine. For this evaluation criteria, the same 158 queries were tested 
on both the search system. Three human judges evaluated the top snippet in the following 
categories: COMPARABLE, COSEER_BETTER and ASKCFPB_BETTER, 
according to which result seemed more relevant to the query. While AskCFPB returns 
documents, not answers, we considered most relevant snippet identi?ed by Bing Search 
Engine. We acknowledge that this is a very stringent evaluation criterion towards Coseer 
systems. 
Amount of Training Data Necessary (Not Used). A third evaluation criterion that is 
not used in this evaluation is the amount of training data that is necessary to train a 
tactical cognitive system. Typically, TCC systems need a fraction of data than other AI 
systems, and do not need them to be annotated. In this evaluation an untrained model 
was used. 
4 Results and Discussion 
For the accuracy calculation, 130 out of the 158 queries were correctly answered by the 
Coseer cognitive search system, as evaluated by the human judges. This computes to 
82.3% accuracy. This seems to be reasonable considering that the system was not trained 
for this subject matter. 
For the comparative study, 158 new queries were considered. Figure 3 shows the 
results of the comparative study. 
Out of the 158 queries, 26.6% showed comparable results. In 16.5% of the cases, 
AskCFPB performed better than Coseer and in 57.0% of the cases, Coseer performed 
better than AskCFPB. 
To get further insight into why one system works better than the other, we reported 
a couple of representative cases. 
Cognitive Natural Language Search Using Calibrated Quantum Mesh 683
Fig. 3. Results of comparative performance between AskCFPB and Coseer. 
Table 1 shows two queries where Coseer performed better than AskCFPB. 
Table 1. Cases where Coseer performed better than AskCFPB. 
Query Coseer answer AskCFPB answer 
How long do mortgages 
normally last? 
How can I determine how long 
it will take me to pay o? my 
mortgage loan? 
What exactly happens when a 
mortgage lender checks my 
credit? 
What type of rent information 
is on my credit report? 
At least one of the big three 
consumer reporting agencies, 
Experian, uses rental payment 
and collection information in 
its credit reports 
What is a credit report? - 
Consumer Financial 
Protection… 
There are several reasons behind the better performance of Coseer over 
AskCFPB. Unlike AskCFPB, Coseer considers the context and the meaning of the 
query. It provides emphasis on the functional words like ‘how long’ instead of 
matching keywords. Similarly, Coseer considers all other possible meanings of the 
search query to execute its search. Special attention is given to the important 
phrases, abbreviations, and colloquialisms. 
Table 2 reports a couple of cases where AskCFPB performed better than Coseer. 
The second query in Table 2 is of special interest. Although the question here is whether 
paying rent on time would strengthen credit history, the information about a weakening 
of the credit history due to late payment is very relevant. Even though it appears to be 
diametrically the opposite answer, AskCFPB has correctly recognized such an answer 
as relevant. Coseer algorithm can be further improved by teaching it how to handle such 
cases. 
684 R. Kulkarni et al.
Table 2. Cases where AskCFPB performed better than Coseer. 
Query Coseer answer AskCFPB answer 
What info does a credit report 
show? 
If the investigation shows the 
company provided wrong 
information about you, or the 
information cannot be veri?ed, 
the company must notify all 
the credit reporting companies 
to which it provided the wrong 
information… 
A credit report is a statement 
that has information about 
your credit activity and current 
credit situation such as loan 
paying history and… 
Can I build my credit history 
by paying my rent on time? 
You have a steady source of 
income and a good record of 
paying your bills on time. 
Lenders will look at your 
ability to repay the 
mortgage… 
Could late rent payments or 
problems with a landlord be in 
my credit report? 
5 Limitations, Conclusions and Future Work 
The most signi?cant limitation of the study is that an untrained AI system was used. In 
future, it is necessary to train a system to achieve more than 90% accuracy as per the 
?rst evaluation criterion. In that study, we will also be able to compare the two systems 
as per the third evaluation criterion - how much data is necessary to train the system? 
Although Natural Language Search is an exciting and popular technology with ever 
increasing areas of applications, its ability to interact with people in a natural manner 
remains at an early stage. We applied a tactical cognitive computing system in conju- 
gation with calibrated quantum mesh to develop a chatbot that helps customers with 
their questions. The search system demonstrated reasonable accuracy in assisting the 
users to ?nd the answers to their queries. Although there are several opportunities to 
improve, this comparative study demonstrates the usefulness of such an approach over 
typical key-word based natural language processing systems. It recommends cognitive 
computing as a key player in solving di?cult problems that require humanlike thinking, 
ability to reason and extract meaning from information. 
We plan to extend CQM for other basic cognitive processes like processing intona- 
tions in speech, translating ideas back into words and perhaps processing and expressing 
unarticulated thoughts and emotions in text. 
Idea-oriented chatbots can be the key to assimilating human and computing worlds. 
Coseer’s solutions demonstrate that we are already capable of designing and training 
machines to process information like humans do, talk like humans do and provide busi- 
ness value as humans do. 
Since the chatbots can run round the clock, at a fraction of the cost of a human 
resource and with high accuracy, it is perhaps not an overstatement to say that the future 
of the chatbot could be the future of all business. 
Cognitive Natural Language Search Using Calibrated Quantum Mesh 685
Acknowledgment. We thank the larger team of Coseer for developing the system. We also thank 
Obaidur Rahaman for assistance in preparing the manuscript. 
References 
1. Ghose, S., Barua, J.J.: Toward the implementation of a topic speci?c dialogue based natural 
language chatbot as an undergraduate advisor. In: 2013 International Conference on 
Informatics, Electronics and Vision (ICIEV), pp. 1–5 (2013) 
2. Heller, B., et al.: Freudbot: an investigation of chatbot technology in distance education. In: 
EdMedia: World Conference on Educational Media and Technology, pp. 3913–3918 (2005) 
3. Hill, J., et al.: Real conversations with arti?cial intelligence: a comparison between human– 
human online conversations and human–chatbot conversations. Comput. Hum. Behav. 49, 
245–250 (2015) 
4. Huang, J.Z., et al.: Extracting Chatbot Knowledge from Online Discussion Forums (2007) 
5. Jia, J.: The study of the application of a web-based chatbot system on the teaching of foreign 
languages. In: Society for Information Technology and Teacher Education International 
Conference, pp. 1201–1207 (2004) 
6. Jia, J.Y.: CSIEC: a computer assisted English learning chatbot based on textual knowledge 
and reasoning. Knowl.-Based Syst. 22, 249–255 (2009) 
7. Goodfellow, I., et al.: Deep Learning, vol. 1. MIT press, Cambridge (2016) 
8. LeCun, Y., et al.: Deep learning. Nature 521, 436 (2015) 
9. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 
(2015) 
10. Ferrucci, D.A.: Introduction to “This is Watson”. IBM J. Res. Dev. 56 (2012) 
11. Li, Y., et al.: Cognitive computing in action to enhance invoice processing with customized 
language translation. Presented at the 2017 IEEE 1st International Conference on Cognitive 
Computing (2017) 
12. McCord, M.C., et al.: Deep parsing in watson. IBM J. Res. Dev. 56 (2012) 
13. Amir, A., et al.: Cognitive computing programming paradigm: a corelet language for 
composing networks of neurosynaptic cores. In: The 2013 International Joint Conference on 
Neural Networks (IJCNN), pp. 1–10 (2013) 
14. Cassidy, A.S., et al.: Cognitive computing building block: a versatile and e?cient digital 
neuron model for neurosynaptic cores. In: The 2013 International Joint Conference on Neural 
Networks (IJCNN), pp. 1–10 (2013) 
15. Esser, S.K., et al.: Cognitive computing systems: algorithms and applications for networks 
of neurosynaptic cores. In: The 2013 International Joint Conference on Neural Networks 
(IJCNN), pp. 1–10 (2013) 
16. Dhoat, K.K.: Cognitive Search Technique for Textual Data. College of Engineering, Pune 
(2013) 
686 R. Kulkarni et al.
Taxonomy and Resource Modeling 
in Combined Fog-to-Cloud Systems 
Souvik Sengupta(B) , Jordi Garcia, and Xavi Masip-Bruin 
Advanced Network Architectures Lab, CRAAX, 
Universitat Polit`ecnica de Catalunya, UPC BarcelonaTech, 
Vilanova i la Geltru, ´ 08800, Barcelona, Spain 
{souvik,jordig,xmasip}@ac.upc.edu 
Abstract. As the technology is rapidly evolving, the society as a whole 
is gradually surrounding by the Internet. In such a high connectivity sce-nario, 
the recently coined IoT concept becomes a commodity driving data 
generation rate to increase swiftly. To process and manage these data in 
an e?cient way, a new strategy, referred to as Fog-to-Clod (F2C), has 
been recently proposed leveraging two existing technologies, fog com-puting 
and cloud computing, where resources are playing a pivotal role 
to manage data e?ciently. In these scenarios, vast numbers of inter-connected 
heterogeneous devices coexist, thus crafting a complex set of 
devices. Managing e?ciently these devices requires a proper resources 
classi?cation and organization. In this paper, we o?er a model to clas-sify 
and taxonomies the whole set of resources aimed to best suit the 
Fog-to-Cloud (F2C) paradigm. 
Keywords: Fog-to-Cloud (F2C)·
Taxonomy 
·
Ontology 
Resources classi?cation 
·
Class diagram 
1 Introduction 
Technologies are rapidly evolving driving the whole society towards a new era 
of smart services. Indeed, day by day, we are moving towards the ‘smart’ to 
the ‘smarter’ world. As per the United Nation [1], by 2050 about 64% of the 
developing world and 86% of the developed world will be urbanized. Also as 
per some statistics [2], by 2050 more than 70% of world population will be 
living in a smart environment, where most of the things will connect to the 
network. Gartner Inc. [3] forecasts that by 2020 almost 20.4 billion connected 
things will be in use worldwide. Also by 2022, M2M tra?c ?ows are expected to 
constitute up to 45% of the whole Internet tra?c [4]. Beyond these predictions, 
the McKinsey Global Institute [5] reported in 2015 that the number of connected 
machines (units) had grown 300% over the last ?ve years. Tra?c monitoring of 
a cellular network in the US also showed an increase of 250% for M2M tra?c 
volume in 2011. Also, Cisco [6] predicted that 50 billion objects and devices 
would be connected to the Internet by 2020. However, although more than 99% 
9 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 687–704, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_52
688 S. Sengupta et al. 
of today’s available things in the world remain unconnected, several pieces of 
evidence de?ne the connectivity trend in di?erent sectors. The following two 
examples are highlighting the fact. According to a Navigant research report [7], 
the number of installed smart meters around the world will grow to 1.1 billion 
by 2022. Another report from Automotive News [8], states that the number of 
cars connected to the Internet worldwide will increase from 23 million in 2013 
to 152 million in 2020. 
By following up all the trends and above scenarios, it is clear that IoT con-nected 
devices are going to rule over the smart environment, being a key com-ponent 
in the whole system. In short, the envisioned ‘smart’ scenario consists 
in a massive amount of IoT devices, highly distributed over the network, along 
with a set of highly demanding services, some of them not yet foreseen though. 
It is also widely accepted that the bene?ts of cloud computing bring to handle 
high processing and storage services demands. However, it is also recognized that 
cloud data centres may fail to deal with services demanding strict low latency, 
mainly due to the distance from the cloud -where the data is to be processed 
- to the edge -where the data is to be collected, and the user is. As a conse-quence, 
some critical undesired e?ects, such as network congestion, high latency 
in service delivery and reduced Quality of Service (QoS) are being experienced 
[9]. By addressing these problems, the fog computing recently came up, rely-ing 
on adding processing capabilities between the cloud data centre and the 
IoT devices/sensors, thus aimed at extending the cloud computing facilities to 
the edge of the network [9,10]. However, interestingly, it is also recognized the 
fact that fog computing is not going to compete with cloud computing, instead 
collaborate both together, intended to provide better facilities to the next gen-eration 
of computing and networking platforms [10]. Indeed, the whole scenario 
may be seen as a stack of resources, from the edge up to the cloud, where a 
smart management system may adequately allocate resources best suiting ser-vices 
demands, regardless where the resources are, either at the cloud or fog. The 
recently coined Fog-to-cloud (F2C) architecture [11], has been proposed intended 
to build such a coordinated management framework. Therefore, it is clear that 
the development and combination of new technologies (i.e., IoT, Cloud, and Fog 
computing, etc.) o?ers a multi fascinate solution for the future smart scenario. 
Unfortunately, the enormous diversity of devices makes such a management 
system not easy to deploy. Indeed, e?cient and proper management of such a set 
of heterogeneous devices is a crucial challenge for any IoT computing platform to 
succeed. However, to facilitate the design of the suggested resources coordinated 
management framework, it is essential to know what the resources characteristics 
and attributes are, thus building some resources catalogue. This paper aims to 
identify a resource taxonomy and resource model, suitable for a coordinated 
F2C system, as a mandatory step towards a real F2C management architecture 
deployment. 
The rest of the paper is organized as follows. Section 2 positions the current 
state of the art. Next, Sect. 3 presents an architectural overview for the coordi-nated 
Fog-to-Cloud paradigm. In Sect. 4, we show a class diagram to represent
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 689 
our taxonomic view of the F2C resources, and also we discuss on the various 
taxonomic parameters considered to make the classi?cation of an F2C resource. 
Following up the previous section, in Sect. 5 we represent and de?ne the gener-alized 
resource model for the F2C computing platform. To support our resource 
model, we present some examples of real devices participating in the F2C sys-tem. 
Finally, some concluding remarks and future directions of our research work 
given in Sect. 6. 
2 State of the Art: Related Work and Motivation 
For any management system, proper utilization of resources undoubtedly facili-tates 
an optimal service execution and hence helps to build an e?ective manage-ment 
solution. Most importantly, to manage the whole set of resources, it is very 
much essential to have them categorized and classi?ed into a resources catalogue. 
Apparently, to build such a description is necessary to identify the character-istics 
and attributes of the resources to be organized. In this paper, we aim at 
determining a resource classi?cation and taxonomy, for a scenario combining fog 
and cloud resources, like the one, we envisioned by the F2C. The underlying 
objective of such a classi?cation is to describe a catalogue of resources, where 
resources are formally de?ned, thus easing both an e?cient resources utilization 
and an optimal services execution. 
In a previous work [12], we put together a comprehensive literature survey, 
highlighting the resource characteristics for distinct computing paradigms and 
also observed several interesting ?ndings. We found that, in most cases, hardware 
components (i.e., memory, storage, processor, etc.), software (i.e., - APIs, OS, 
etc.) and network aspects (i.e., - protocol, standards etc.) of the devices [13,14] 
have been considered to classify the edge resources. Even for grid resources hard-ware 
components have also been studied (i.e., storage capacity, memory etc.), 
to classify them [15,16]. We recognized the relevance of e?cient network man-agement 
to build a dynamic computing platform, many references [13,17,18], 
put the focus to identify the networking standards, technology and bandwidth 
capacity. Interestingly, after revisiting the literature, we found that in most of 
the fog and edge computing related work focuses on the network bandwidth as 
the essential characteristics for e?cient network management. It is worth high-lighting 
the fact that, the closer to the edge resources are, the more signi?cant 
the impact on the access network is. Indeed, access networks become a criti-cal 
part of the whole network infrastructure concerning the quality provision-ing, 
congestion, real capacity and availability and also the part where devices’ 
mobility brings signi?cant collateral e?ects on performance. Hence, as a sum-mary, 
network bandwidth - as well as other network attributes at the edge must 
be undoubtedly considered as a critical characteristic to characterize a resource. 
Also, di?erent edge devices may use di?erent networking standards and tech-nologies 
to communicate [13]. So consideration of the networking standards and 
techniques are also mandatory when categorizing a resource.
690 S. Sengupta et al. 
Di?erently, in the cloud arena, no such concerns are found for processing, 
storage, power, or network (i.e., bandwidth) capacities of the cloud resources. 
Interestingly, researchers have given their focus on managing the security, pri-vacy, 
and reliability aspects [18,19] in the cloud paradigm. We also found that 
cost management (i.e., charges for access and utilization of resources), is one of 
the crucial aspects to build an e?cient Cloud platform [20]. Indeed, several works 
propose a cost model for system resources and services [18,20]. After a compre-hensive 
reading (see [12] for more details) we may conclude that: (i) most of 
the cloud-resources have some unique features - e.g., they are centralized, fault-tolerant 
[18,20–22] etc.; (ii) in IoT, edge or fog, resources are geographically 
distributed [12,21,23], while much agiler than cloud resources and suitable for 
supporting real-time services. In summary, we may quickly assess that there are 
a signi?cant variety and diversity of system resources, what undoubtedly makes 
resource categorization a challenging task. 
3 An Overview of the F2C Architecture 
The F2C has been introduced as a framework intended to both optimize the 
resources utilization and improving the service execution, the latter through an 
optimal mapping of services into the resources best suiting the services demands. 
To that end, resources categorization becomes an essential component for a suc-cessful 
F2C deployment. Consequently, an accurate description of the di?erent 
attributes and characteristics to be used to categorize a resource e?ciently. Just 
as an illustrative example, Fig. 1 depicts a picture showing how an F2C deploy-ment 
in a smart city may look like, mainly representing the technological inte-gration 
of the Cloud, Fog/Edge and IoT resources. 
Fig. 1. Fog and cloud resources deployment in a Smart City.
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 691 
It is pretty apparent to observe the fact, that in a smart city, as shown in the 
Fig. 1, several distinct and heterogeneous fog node devices may be found (i.e., 
smartphone, smartwatch, car, etc.) and also many IoT devices (i.e., surveillance 
camera, temperature sensor etc.) can be connected or attached with them. We 
also identify that several devices may become the leader fog node (i.e., road-side 
unit, etc.) and each of them serve as the fog service provider of a particular fog 
area to the smart city. Similarly, many di?erent cloud providers may take over 
the provisioning of cloud facilities to the citizens. The F2C solution, designed 
to be a coordinated management platform, facilitates optimal management of 
this broad set of heterogeneous resources (i.e., -IoT devices, fog nodes, cloud 
resources, etc.). Unquestionably, the supervision of heterogeneous resources is 
a crucial characteristic of the F2C platform. Thus, before devoting e?orts to 
categorize the resources, it is mandatory to revisit what the main aspects of 
F2C are. In [11], the F2C is proposed as a combined, hierarchical and layered 
architecture, where cloud resources reside at the top layer, the IoT layer at the 
bottom consists in the set of IoT devices, and several intermediate fog layers 
are considered bringing together the collection of heterogeneous edge devices. In 
Fig. 2, we represent the hierarchical structure of the F2C architecture. Following 
the hierarchical structure of the F2C architecture and considering the smart city 
scenario, we found that the leader fog node of each fog area is responsible for 
communicating with the upper layer resources in the F2C platform. Also, the 
leader fog node is responsible for informing the upper layer resources about the 
total resource information of its fog area. 
It is worth emphasizing the fact that the concept of fog node has not widely 
converged towards a unique de?nition yet. Although in a general view, this paper 
is only using such fog node concept to represent a device belonging to fog (or by 
extension to F2C), readers interested in this topic may ?nd a more elaborated 
discussion on its meaning in [24]. 
Fig. 2. Hierarchical architecture of F2C paradigm.
692 S. Sengupta et al. 
Authors in [11], highlight the need to have a comprehensive devices control 
and management strategy to build an e?cient F2C system. As said earlier, it is 
essential to correctly identify the resources characteristics and behaviour for a 
successful F2C deployment. Indeed, by adequately identifying resources charac-teristics 
and their behaviours - helps to build an e?cient taxonomy of resources 
of the F2C paradigm. This taxonomy would help the services to resources map-ping 
process and thus optimizing the service execution. In the next Sect. 4, we 
present the taxonomic view of F2C resources and later, in Sect. 5, we present a 
generalized resource model for the F2C paradigm. 
4 Proposing Taxonomy of the F2C Resources 
The enormous diversity, heterogeneity and variety envisioned for the whole set 
of resources from the edge up to the cloud, makes resources management in Fog-to-
Cloud a challenging e?ort. From a broad perspective, it is pretty evident that 
the closer to the top (i.e., cloud) the larger the capacities are. Thus, we may 
undoubtedly assess that computation, processing and storage capabilities are 
higher in the cloud than in fog and higher in fog than in the edge. Interestingly, in 
the F2C envisioned scenario this assessment is even more elaborated, leveraging 
the di?erent layers foreseen for fog. Indeed, in F2C di?erent layers are identi?ed 
to meet di?erent characteristics of distinct devices. Thus, considering the current 
state of the art contributions, the speci?c layers architecture de?ned in F2C and 
the potential set of attributes to characterize each one of them, we propose a 
taxonomy for characterizing resources in an F2C system, as described next. 
In the collaborative model foreseen in an F2C system, devices may partici-pate 
as either ‘Consumer’, ‘Contributor’, or ‘Both’ of them. When a device acts 
as a ‘Consumer’, the device gets into the F2C system to execute services, thus 
being a pure resources consumer. When acting as ‘Contributor’, the device o?ers 
its resources to both, itself and third users (in a future collaborative scenario), 
to run services. Finally, some resources can act as ‘Both’, hence not only access-ing 
(i.e., consuming) some services but also contributing with their resources 
to support services execution. Thus, according to the participation role, in ?rst 
approach resources in an F2C system may be classi?ed into three distinct types. 
However, although the participation role is a key aspect, many other attributes 
and characteristics must be considered as well in order to accommodate the 
large heterogeneity of resources, including - Device attributes (Hardware, 
Software, Network speci?cation etc.), IoT components & Attached com-ponents 
(Sensors, Actuators, RFID tags, Other attached device components), 
Security & Privacy aspects (Device hardware security, Network Security 
and Data Security), Cost Information (Chargeable device, Non-Chargeable 
device), and History & Behavioural information (Participation role, Mobil-ity, 
Life Span, Reliability, Information of the device location, etc.).
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 693 
Fig. 3. The ontology-based F2C resource classi?cation. 
4.1 Taxonomy Modeling: Based on Ontology 
In this paper, we present an F2C resource taxonomy model leveraging a proposed 
ontology. To that end, in order to present the ontology-based resource taxonomy 
model in F2C paradigm, we adopt the classi?cation method proposed by Perez 
[25]. According to the ontological model, modeling elements are divided into 
?ve basic modeling original language: classes, relations, functions, axioms, and 
instances. The ontology model O, is depicted in Fig. 3 and is shown as: 
O = 
{C, R, F, A, I} 
(1) 
C represents the class or concepts and can be further classi?ed and subdivided 
into a kind of basic class Ci. R represents the collection of relations, mainly 
containing four basic types: part-of, kind-of, instance-of and attribute-of. F rep-resents 
the collection of functions which can be formalized as: 
F = C1 
× 
C2 
× 
C3 
× 
... 
× 
Cn-1 
?. 
Cn (2) 
A represents the collection of axioms, and I represents the collection of instances. 
Based on the ontological model described above, this paper analyzes the basic 
elements of parameters C (class) and R (relation), according to the attributes 
and expected behaviour for the whole set of resources in an F2C system. This 
analysis will help to both propose the resource taxonomy for F2C and build the 
resource description model for F2C.
694 S. Sengupta et al. 
4.2 F2C Resource Taxonomy: View of the Class Diagram 
Adopting the ontological model described above and following the attributes and 
expected behaviour for the F2C system resources, Fig. 4, depicts in the form of 
a class diagram the taxonomy proposed for F2C resources. 
Fig. 4. Class diagram of the F2C Resource taxonomy: a completed model in Prot´eg´e. 
According to the proposed class diagram, all resources in F2C can be initially 
classi?ed according to ?ve di?erent classes, each one further divided into several 
sub-classes. Next, we present a brief description of each class and subclasses. 
1. Device attributes - Devices participating in an F2C system can be classi?ed 
according to their hardware, software, networking speci?cation and also by 
considering their type. 
– Hardware components - In an F2C system, storage, processor, main mem-ory, 
graphics processing unit, and power source information of a device 
help to classify them further. 
– Software components - To participate in any service-oriented comput-ing 
paradigm, devices must have an entry point, ‘software’ or ‘applica-tion’. 
We assume that devices can join an F2C system in two ways: (i) 
devices have the application or software copy installed, or; (ii) devices 
must connect to another device, running the application or software copy. 
According to the F2C architecture, two types of entry point are identi?ed 
for F2C resources: (i) one for cloud resources, and; (ii) another one for 
the fog resources. This characteristic must also be considered to classify 
F2C resources. Finally, also the operating system information and other 
installed apps and APIs information will help classify them. 
– Device type - Devices participating in an F2C system can be either phys-ical 
or virtual device.
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 695 
– Networking information - According to the large diversity of devices envi-sioned 
in an F2C system; devices are expected to use several di?erent 
networking standards and technologies (i.e., wi?, Bluetooth, etc.). Hence, 
information about the networking standards and supported technologies 
must also be considered to classify F2C resources. Finally, being a key 
attribute in the networking arena, we identify bandwidth as a key param-eter 
to characterize F2C resources as well. 
2. IoT components & Attached components information - The resources working 
in an F2C system may have some sensors, actuators, RFID tags and other 
attached-device components (i.e., webcam, printer, etc.). Therefore, resources 
can be further classi?ed according to the information of sensors, actuators, 
RFID tags and other attached device components. 
– Sensors - F2C resources may have attached various kind of sensors (i.e., 
temperature sensor, proximity sensor, etc.). Therefore, this information 
must also be considered. 
– Actuators - Similar to the previous one, many di?erent actuators may be 
attached to F2C resources (i.e., Mechanical, Thermal or Magnetic etc.). 
Hence, similarly, this information must also be considered. 
– RFID tags - F2C resources may also have the active or passive type of 
RFID tags attached, so to be considered as well. 
– Other attached device components - Many di?erent external devices may 
be connected to an F2C device (i.e., Webcam, external audio system, 
printer, scanner, Arduino kit etc.). This information is enriching the 
whole system; thus it must be undoubtedly considered to classify an F2C 
resource. 
3. Security & Privacy aspects - To build an e?cient system, it is essential to iden-tify 
the set of system resources requiring some protection and those requiring 
not to be protected. In an F2C system, according to the device hardware 
security, data privacy and the network security aspects, the resources can be 
further classi?ed as protected and insecure resources. 
4. Cost information - In an F2C system, some resources are expected to o?er 
free access (i.e., with no cost) while some other may require some fee for 
granting access. Therefore, according to the accessing cost, F2C resources 
can be classi?ed into Chargeable and Non-Chargeable resources. 
5. History & Behavioral information - Beyond considering information about 
resources attributes and components, resources in an F2C system may also 
be classi?ed according to the information of their present and past system 
interaction, including resource reliability, life-span, mobility information, par-ticipation 
role and information of their location. 
Based on the above analysis, this paper considers the following ?ve classes, device 
attributes, information of IoT components and other attached devices, cost infor-mation, 
security and privacy aspects, history and behavioural information, to 
categorize resources in the F2C system.
696 S. Sengupta et al. 
5 Presenting the Resource Description Model in F2C 
In an F2C system, several fog areas may co-exist, as shown in Fig. 5 for an 
illustrative smart city scenario. Each fog area is composed of one leader fog node, 
various kind of fog node devices, IoT (i.e., sensors, actuators, etc.) and other 
elements (i.e., printer, etc.), putting together a heterogeneous set of resources 
as well as di?erent data sources. As earlier stated, such heterogeneity makes 
some challenges for the global management. Thus, as also mentioned in this 
paper, correct and appropriate classi?cation of resources becomes a must, to 
facilitate in such coordinated management. Also, it is necessary to have a clear 
and combined version of a generalized resource description. In this section, we 
de?ne a combined version of a generalized resource description for devices in an 
F2C system. 
Fig. 5. F2C scenario in a smart city. 
Based on the previously described ontology and matching the F2C archi-tecture, 
we conclude that the design of the full classes and sub-classes for each
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 697 
resource turns into a key challenge to manage the whole system resources prop-erly. 
Moving back to the smart city scenario depicted in Fig. 5, we may see, just 
as an example, that the laptop contains the classes of device attributes, IoT com-ponents 
& Attached components, Security & Privacy aspects, Cost information 
as well as History & Behaviors. Each class includes di?erent subclasses, such 
as Hardware components, software components, Network information etc. Also, 
the laptop contains a device id and a user id. To build an e?cient F2C system 
and to manage all the system resources properly, it is also essential to know the 
total capacity and attributes of each fog area. Figure 5 shows that a fog area is 
composed by a leader fog node, several types of fog node devices (i.e., laptop, 
car, smartphone) and other attached devices (i.e., printer, light, etc.). Leveraging 
such attributes description, we ?rst propose a generalized resource description 
model for an F2C system in Subsect. 5.1, and later, in Subsect. 5.2; we focus 
on identifying the aggregated resource information model for a particular fog 
area. 
Moreover, it is worth highlighting the fact that for an F2C system to properly 
work, the resource information must be stored e?ciently. To that end, it is essen-tial 
to have a strong but light-weight database. Also, e?ciently and guaranteed 
transfer of the resource description information, it is also mandatory to describe 
the resource information through a standard and formatted language. Consid-ering 
the characteristics of di?erent databases and languages and according to 
the proposed model, in this paper, we adopt a relational database management 
system -SQLite, to store the resource information. Finally, to transfer the data 
from one resource to another resource, in this paper we adopt JSON as the 
information transferring implementation language. 
5.1 Generalized Resource Description Model: A Single Resource 
To participate in any service-oriented computing platform, devices must have an 
entry point or ‘software’ or ‘application’ to join in. In the F2C system, devices can 
join the system by two ways: (i) they have the ‘application’ or ‘software’ installed 
on their device, or; (ii) they connect to another device that has the ‘application’ 
or ‘software’. So, considering the ontology-based resources classi?cation model 
proposed in Sect. 4, and for the sake of illustration aligned to the smart city 
scenario depicted in Fig. 5, all devices in the smart city are denoted as - R, and 
all devices endowed with the F2C enabled ‘software’ or ‘application’ copy are 
denoted as - RF 2C. Hence, according to our proposed resource taxonomy of a 
F2C system, RF 2C 
?C 
R. The devices that do not have the F2C enabled ‘software’ 
or ‘application’, can also join the F2C system through a connection with an F2C 
enabled device. They can be known as - ‘Other attached device components’ of
698 S. Sengupta et al. 
the F2C enabled-device. We present the generalized resource description model 
for the F2C enabled-device in a tuple form, as follows: 
RF 2C 
= 
< 
user name; device id; 
Device attributes: < 
Hardware components: < 
Storage information; 
Main memroy information; 
Processor information; 
Power source information; 
GPU & FPGA information 
>; 
Software components: < 
Apps & APIs: < 
F2C app: < 
cloud resource app; 
fog resource app 
>; 
Other apps & APIs 
>; 
Operating system 
>; 
Network information: < 
Bandwidth information; 
Networking standards information 
>; 
Resource type: <Physical device; 
Virtual device> 
>; 
IoT components & Attached components: < 
Sensors; 
Actuators; 
RFID tags; 
Other attached device components 
>; 
Security & Privacy aspects: < 
Device hardware security; 
Network security; 
Data privacy 
>; 
Cost information: < 
Chargeable device; Non-Chargeable device 
>; 
History & Behaviors: < 
Participation role; Mobility; Life span; 
Reliability; 
Information of the device Location; 
resource sharing information 
> 
> 
Before sharing the resource information, all resources (RF 2C) in the F2C sys-tem, 
keep storing a copy of their resource information according to the general-ized 
resource description model. Resources(RF 2C) are using their local database 
(i.e., SQLite) to store their resource and components information. To share the 
resource information e?ciently with other F2C enabled resources, we adopt the 
JSON language to make a standard and formatted description ?le. In the Listing 
1.1, we represent the resource description ?le of an F2C enabled laptop, based 
on the JSON language. The description ?le contains the detailed information 
about the hardware (i.e., total and current available storage, RAM informa-tion), 
software (i.e.,OS information, F2C app information etc.), IoT and other
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 699 
attached components (sensors and other connected device information), history 
& behavioural (i.e., current location information, participation role etc.) etc. of 
the F2C enabled laptop. 
Listing 1.1. The JSON-formatted resource description ?le for a F2C-enabled laptop: 
An example 
1 
2 { 
3 " user_name ":" craax_user123 ", 
4 " device_id ":11078934576 , 
5 " Device_attributes ": { 
6 " Hardware_components ": { 
7 " Storage_information_ ( _in_MB_ ) ": { 
8 " Total ":122880 , 
9 " Available ":965890 
10 }, 
11 " Main_Memory_information_ ( _in_MB_ ) ": { 
12 " Total ":32768 , 
13 " Available ":13968 
14 }, 
15 " Processor_information ": { 
16 " Processor_maker ":" Intel Core i7 -8550 U CPU @ 1.80 GHz ", 
17 " Available_percentage_of_processor ":90.7 
18 " Processor_architecture ":" X86_64 " 
19 }, 
20 . 
21 . 
22 }, 
23 " Software_components ": { 
24 " Operating_system ":" Windows -10 -10.0.16299 - SP0 ", 
25 " Apps_ & _APIs ": { 
26 " F2C_app ":" fog_resource_app ", 
27 " Other_apps_ & _APIs ": { 
28 " Adobe Acrobat Reader DC ", 
29 " AMD Software ", 
30 . 
31 . 
32 } 
33 } 
34 }, 
35 . 
36 . 
37 }, 
38 " IoT_components_ & _attached_device_components ": { 
39 . 
40 . 
41 }, 
42 . 
43 . 
44 } 
5.2 Resource Description Presentation: Aggregated Model in Each 
Hierarchy of F2C 
As shown in Fig. 5 several fog areas may be included in a smart city, each of 
them providing F2C services to the citizens. The policies used to de?ne the fog 
areas are out of the scope of this paper. However, it is pretty apparent that 
correct management of the whole set of resources in fog areas is essential to 
make the F2C system to be accurate and e?cient. Unfortunately, since each fog 
area is built by distinct resources not only in quantity but also in typology, the 
capacity of processing, storage, power and networking techniques may di?er for 
each individual fog area, thus endowing each particular fog area with distinct 
characteristics and features. This scenario makes the management of all fog
700 S. Sengupta et al. 
areas notably challenging, thus di?culting the objective of building an e?cient 
F2C system. To mitigate this problem, a clear description of the entire set of 
capacities and characteristics of each individual fog area is mandatory. 
Fig. 6. Resource information sharing: from Fog to Cloud. 
Previously we de?ned that, in the F2C system, devices those are sharing 
their resources can participate in the system as - ‘Contributor’, or ‘Both’. Let’s 
consider the Fig. 6 as an illustrative scenario to depict that cooperative sce-nario. 
We may see that ‘Fog Area1’, contains one leader fog node and two fog 
node devices (i.e., smartphone, laptop) along with other connected devices (i.e., 
printer, bulb etc.). Let’s consider that the two fog node devices and the leader 
fog node are participating in the system as ‘Both’. In this case, the two fog 
node devices are sharing their resource information with the leader fog node. 
Thus, once the leader fog node receives the resource information for the two 
fog node devices, it aggregates all the information along with its own resource 
components information to form the resource information for the particular fog 
area. Then, the leader fog node shares this aggregated information to the higher 
layer in the F2C architecture. To make it work an strategy to aggregate the 
resources information must be de?ned. To that end, next, we propose a general-ized 
aggregated resource description model for the F2C system. We identify the
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 701 
aggregated resource description model as aRDF 2C, and its structure is described 
as following: 
aRDF 2C 
= 
< 
fog node id; fog area id; 
total number of the attached F2C enabled resources; 
main memory capacity info ( in MB ): < 
total available main memory; 
F2C resource with highest main memory; 
F2C resource with lowest main memory 
>; 
storage capacity info ( in MB ): < 
total available storage size; 
F2C resource with highest storage size; 
F2C resource with lowest storage size 
>; 
processor info: < 
processing capacity info ( in percentage ): < 
average of processing capacity; 
F2C resource with highest processing; 
F2C resource with lowest processing 
>; 
processor core info ( number of cores ): < 
average of total number of cores; 
F2C resource with highest processor core; 
F2C resource with lowest processor core 
> 
>; 
gpu capacity ( in MB ): < 
total available gpu capacity; 
F2C enabled resource with highest gpu; 
F2C enabled resource with lowest gpu 
>; 
power info remaining time ( in seconds ): < 
average time of power remain; 
F2C resource with highest power remain; 
F2C resource with lowest power remain 
>; 
IoT & other attached devices info: < 
sensors type info; 
actuators type info; 
RFID tag type info; 
other attached device info; 
>; 
Security & Privacy score: < 
average score for F2C resource; 
F2C enabled resource with highest score; 
F2C enabled resource with lowest score 
> 
> 
By following this aggregated resource information, it can be easily drawn that 
it is quite di?erent from the generalized resource description model of a single 
F2C resource. After getting all the resource information of a fog area, the leader 
fog node of the respective area is aggregating all of the information, and it is 
making an aggregated description ?le according to the upper mentioned model. 
The aggregated description ?le only contains the information about leader fog 
node id, fog area id, total number of fog nodes, the total capacity of main 
memory, storage, GPU etc., information about the highest and lowest main 
memory, storage, processing, GPU capacity of the F2C enabled fog node of 
the respective fog area and so on. Then after creating the aggregated resource 
information model, the leader fog node share this information with the upper 
layer resources of the F2C paradigm.
702 S. Sengupta et al. 
6 Conclusion 
In this paper, we start highlighting the need to de?ne a resources model to 
ease the management of the F2C system. To that end, we begin presenting a 
taxonomy for F2C resources. Leveraging the taxonomy along with the recent 
literature, we propose an ontology-based resource description model for the F2C 
system, where resources are described by device attributes, IoT components and 
attached components, security and privacy aspects, cost information, and histor-ical 
and behavioural information classes. The proposed model is illustrated in a 
smart city scenario for the sake of understanding. And ?nally, in this paper, we 
have also introduced the model for a generalized aggregated resources descrip-tion 
?le, aimed at sharing the resource information of a particular fog area. This 
work is presented as the ?rst step towards a comprehensive resource categoriza-tion 
system which is considered as mandatory for an e?cient F2C management 
framework. Still, many challenges remain to be addressed. For example, consid-ering 
active/non-active resources in the aggregated information, or even more 
interesting, de?ning a strategy to implement the resource sharing as described in 
the F2C. Even the classi?cation of the F2C resources will help us to ?nd out the 
proper resources to map with services in the F2C paradigm. Implicitly, this work 
will help us to de?ne the cost-model for the F2C resources, and that will also 
help us to ?nd out some optimal solution for choosing the resources to execute 
some tasks and provide some services. Thus, these challenges, as well as many 
other open issues, will constitute the core of our future work as a follow up of 
this paper. 
Acknowledgment. This work was supported by the Spanish Ministry of Economy 
and Competitiveness and the European Regional Development Fund, under contract 
TEC2015-66220-R(MINECO/FEDER), and by the H2020 EU mF2C project reference 
730929. 
References 
1. Department of Economic and Social A?airs. World Urbanization Prospects The 
2014 Revision - Highlights. United Nations (2014). https://esa.un.org/unpd/wup/ 
publications/?les/wup2014-highlights.pdf. ISBN 978-92-1-151517-6 
2. Ismail, N.: What will the smart city of the future look like? Information Age 
Magazine, 21 September 2017. http://www.information-age.com/will-smart-city-future-
look-like-123468653/ 
3. van der Meulen, R.: Gartner Says 8.4 Billion Connected “Things” Will Be in Use 
in 2017, Up 31 Percent From 2016. Press Release by the Gartner, Inc. (NYSE: IT), 
7 February 2017. https://www.gartner.com/newsroom/id/3598917 
4. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet 
of Things: a survey on enabling technologies, protocols, and applications. IEEE 
Commun. Surv. Tutor. 17(4), 2347–2376 (2015) 
5. Manyika, J., Woetzel, J., Dobbs, R., Chui, M., Bisson, P., Bughin, J., Aharon, 
D.: Unlocking the potential of the Internet of Things. McKinsey&Company, 
June 2015. https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/
the-internet-of-things-the-value-of-digitizing-the-physical-world
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 703 
6. Cisco Systems Inc.: New Cisco Internet of Things (IoT) System Provides 
a Foundation for the Transformation of Industries. Cisco News, 29 June 
2015. https://investor.cisco.com/investor-relations/news-and-events/news/news-details/
2015/New-Cisco-Internet-of-Things-IoT-System-Provides-a-Foundation-for-
the-Transformation-of-Industries/default.aspx 
7. Martin, R.: The Installed Base of Smart Meters Will Surpass 1 Billion by 2022, 
Posted in the Newsroom of the Navigant Research, 11 November 2013 
8. Ahmed, E., Yaqoob, I., Gani, A., Imran, M., Guizani, M.: Internet-of-Things-based 
smart environments: state of the art, taxonomy, and open research challenges. IEEE 
Wirel. Commun. 23(5), 10–16 (2016) 
9. Mahmud, R., Buyya, R.: Fog computing: a taxonomy, survey and future directions. 
In: Internet of Everything, pp. 103-130. Springer (2018) 
10. Bonomi, F., Milito, R., Natarajan, P., Zhu, J.: Fog computing: a platform for 
internet of things and analytics. In: Big Data and Internet of Things: A Roadmap 
for Smart Environments, pp. 169–186. Springer (2014) 
11. Masip-Bruin, X., Marin-Tordera, E., Jukan, A., Ren, G.J., Tashakor, G.: Foggy 
clouds and cloudy fogs: a real need for coordinated management of fog-to-cloud 
(F2C) computing systems. IEEE Wirel. Commun. Mag. 23(5), 120–128 (2016) 
12. Sengupta, S., Garcia, J., Masip-Bruin, X.: A literature survey on ontology of di?er-ent 
computing platforms in smart environments. arXiv preprint arXiv:1803.00087 
(2018) 
13. Perera, C., Qin, Y., Estrella, J.C., Rei?-Marganiec, S., Vasilakos, A.V.: Fog com-puting 
for sustainable smart cities: a survey. ACM Comput. Surv. (CSUR) 50(3), 
32 (2017) 
14. Dorsemaine, B., Gaulier, J.-P., Wary, J.-P., Kheir, N., Urien, P.: Internet of Things: 
a de?nition & taxonomy. In: 2015 9th International Conference on Next Generation 
Mobile Applications, Services and Technologies, pp. 72–77 (2015) 
15. Vaithiya, S., Bhanu, M.S.: Ontology based resource discovery mechanism for mobile 
grid environment. In: 2013 2nd International Conference on Advanced Computing, 
Networking and Security (ADCONS), pp. 154–159 (2013) 
16. Karaoglanoglou, K., Karatza, H.: Directing requests in a large-scale grid system 
based on resource categorization. In: 2011 International Symposium on Perfor-mance 
Evaluation of Computer & Telecommunication Systems (SPECTS), pp. 
9–15 (2011) 
17. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a 
vision, architectural elements, and future directions. Futur. Gener. Comput. Syst. 
29(7), 1645–1660 (2013) 
18. Arianyan, E., Ahmadi, M.R., Maleki, D.: A novel taxonomy and comparison 
method for ranking cloud computing software products. Int. J. Grid Distrib. Com-put. 
9(3), 173–190 (2016) 
19. Parikh, S.M., Patel, N.M., Prajapati, H.B.: Resource management in cloud com-puting: 
classi?cation and taxonomy. arXiv preprint arXiv:1703.00374 (2017) 
20. Zhang, M., Ranjan, R., Haller, A., Georgakopoulos, D., Menzel, M., Nepal, S.: 
An ontology-based system for cloud infrastructure services’ discovery. In: 2012 8th 
International Conference on Collaborative Computing: Networking, Applications 
and Worksharing (CollaborateCom), pp. 524–530 (2012) 
21. Baccarelli, E., Naranjo, P.G.V., Scarpiniti, M., Shojafar, M., Abawajy, J.H.: Fog 
of everything: energy-e?cient networked computing architectures, research chal-lenges, 
and a case study. IEEE Access 5, 9882–9910 (2017)
704 S. Sengupta et al. 
22. Moscato, F., Aversa, R., Di Martino, B., Forti¸s, T.-F., Munteanu, V.: An analysis 
of mOSAIC ontology for cloud resources annotation. In: 2011 Federated Conference 
on Computer Science and Information Systems (FedCSIS), pp. 973–980 (2011) 
23. Botta, A., de Donato, W., Persico, V., Pescap`e, A.: Integration of cloud computing 
and internet of things: a survey. Futur. Gener. Comput. Syst. 56, 684–700 (2016) 
24. Marin-Tordera, E., Masip-Bruin, X., Garcia, J., Jukan, A., Ren, G.J., Zhu, J.: Do 
we all really know what a Fog Node is? Current trends towards an open de?nition. 
Comput. Commun. 109, 117–130 (2017) 
25. Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O.: Ontological engineering: with 
examples from the areas of knowledge management, e-commerce and the semantic 
web. Data Knowl. Eng. 46(1), 41–64 (2003)
Predicting Head-to-Head Games 
with a Similarity Metric and Genetic 
Algorithm 
Arisoa S. Randrianasolo1(B) 
and Larry D. Pyeatt2 
1 
Lipscomb University, Nashville, TN, USA 
arisoa.randrianasolo@lipscomb.edu 
2 
South Dakota School of Mines and Technology, Rapid City, SD, USA 
larry.pyeatt@sdsmt.edu 
Abstract. This paper summarizes our approach to predict head to head 
games using a similarity metric and genetic algorithm. The prediction 
is performed by simply calculating the distances of any two teams, that 
are set to play each other, to an ideal team. The nearest team to the 
ideal team is predicted to win. The approach uses genetic algorithm as 
an optimization tool to improve the accuracy of the predictions. The 
optimization is performed by adjusting the ideal team’s statistical data. 
Soccer, basketball, and tennis are the sport disciplines that are used to 
test the approach described in this paper. We are comparing our pre-dictions 
to the predictions made by Microsoft’s bing.com. Our ?ndings 
show that this approach appears to do well on team sports, accuracies 
above 65%, but is less successful for predicting individual sports, accu-racies 
less than 65%. In our future work, we plan to do more testing on 
team sports as well as studying the e?ects of the di?erent parameters 
involved in the genetic algorithm’s setup. We also plan to compare our 
approach to ranking and point based predictions. 
Keywords: Sports predictions 
·
Similarity calculation 
Genetic algorithm 
1 Introduction 
International sport competitions, professional sports, college sports, and even 
regional and city tournaments now keep track of various data about the teams 
involved in the competitions. Those data can be available right away as the 
games progress, or may be extracted later by some experts after reviewing the 
video of the games. The challenge is ?nding ways to make use of the available 
data. Is there enough information in the data to predict the outcomes of future 
games? What algorithm and calculations can be utilized to predict the outcomes 
of future games? Those are some of the questions that teams and coaches may 
have after receiving their statistical data from a tournament. 
.a
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 705–720, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_53
706 A. S. Randrianasolo and L. D. Pyeatt 
In this paper, we summarize our approach to predicting the outcomes of head 
to head games in tournaments. Our approach di?ers from others because it is not 
utilizing all the possible historical data that can be gathered about the teams 
that are involved. It is also not taking in consideration the past performance of 
the teams in the same competition from previous years or previous matches. We 
restrict the data that we are using to perform the prediction to only consist of 
the most recent teams’ statistics in the tournament of interest. 
This restriction of the data is based on the assumption that the performance 
in the current tournament of interest is most indicative of the current strength 
of the teams. Also, by using this restriction, this approach can be used in tour-nament 
settings where teams do not necessarily know much about each other 
before hand. This latter reason is our main motivation for this research. 
Our approach uses a similarity metric over the most recent statistical data of 
the teams involved in the tournament to predict the outcomes of head to head 
games. To improve the predictions, we use a genetic algorithm as an optimization 
mechanism. 
This paper will cover some of the previous work done in terms of head to 
head game predictions. Then, it will explain our early observation in predicting 
head to head games. The forth section of the paper will cover the approach 
that we are proposing. This will be followed by the testing and the results of 
our experiments. The last section of this paper will contain our conclusions and 
future work. 
2 Related Work 
The idea of predicting the outcome of a pairwise sport matchup is a research 
topic for many investigators. Chen and Joachims explained the use of a general 
probabilistic framework for predicting the outcome of pairwise matchups using 
the blade-chest model [1,2]. A player or a team was represented by a blade 
vector and a chest vector. The winning and losing probabilities were decided 
based on the distance between one player’s blade to his opponent’s chest and vice 
versa. The blade and chest vector were extracted from the player’s data and the 
game features. This approach trained on historical data to tweak the parameters 
involved in the model by maximizing the log-likelihood of the probability of the 
known winner. 
Machine learning is also used widely in sport predictions. In most of these 
cases, as in the approach described previously, a considerable amount of historical 
data is needed to train the model. For example, Pretorius and Parry trained a 
random forest on past rugby games in order to predict the 2015 Rugby World 
Cup [3]. The accuracy of the predictions made by their system was no di?erent 
than the prediction made by human agents on the 2015 Rugby World Cup. 
Brooks, Kerr, and Guttag trained an SVM to predict if possessions will result 
in shots in soccer [5]. The approach was applied on the Spanish La Liga soccer 
league using the data from the 2012–2013 season. It had an Area Under the ROC 
(Receiver Operating Characteristic) curve of 0.79. Microsoft’s Bing Predicts [11]
Predicting Head-to-Head Games 707 
also claims to use machine learning in its prediction. Bing Predicts claimed 
a 63.5% accuracy on predicting the 2016 NCAA March Madness and a 75% 
prediction accuracy on the 2015 Women’s Soccer World Cup. 
Evolutionary systems are also used in sport and matchup predictions. Soares 
and Gilbert used a Particle Swarm Optimizer (PSO) to predict Cross-country 
results [4]. Their approach transformed the team features from historical data 
into a set of rankings. The rankings were multiplied by weights to produce the 
?nal rankings. The ?nal rankings were then evaluated from the results of the 
cross-country meets as follows: A team received 1 point for each team it beat if 
it was ranked ahead of that team, and received 1 point for each team it lost to 
if it was ranked behind that team. A team received 0 points for each team in 
which the opposite of either case above happened [4]. The goal of this approach 
was to maximize the points earned through producing the ?nal rankings used in 
the predictions, and the way to do so is to optimize the weights using a PSO. 
Another approach that uses ranking as a way to predict performance is to 
create a complex-network based on di?erent measures, such as clustering coe?- 
cient and node degree [6,7]. With this approach, a team sports league is viewed 
as a network of players, coaches, and teams in evolution. The network was used 
to predict teams’ behavior and to predict rankings. The rankings could be used 
to predict the league’s winner. This approach was applied to NBA (National Bas-ketball 
Association) and MLB (Major League Baseball) data and has achieved 
a 14% rank prediction accuracy improvement over its best competitor [7]. 
The ?rst di?culty in using many of these approaches resides in ?nding the 
appropriate functions or transforms that can extract the needed information 
from the historical data. Our approach uses a simple similarity metric and the 
well known genetic algorithm to create the predictions. The second di?culty 
arises from the struggle of ?nding enough data to train the model. In well known 
competitions with well known teams, ?nding historical data is not a problem. 
However, in less known competitions, such as regional or city or invitational or 
small tournaments, ?nding historical data is not always possible. This is the 
reason why we restrict the data that we are using to perform the predictions to 
only consist of the most recent teams’ statistics in the tournament of interest. 
We apply this restriction in all of the tournaments that we predicting regardless 
of whether they are well known or not. 
3 Early Observation 
This research started because of a soccer coach who came to us with all sorts 
of data about his team, and was struggling to ?nd a way to use it to his team’s 
advantage. The data that we received had no information about the other teams 
in the division, so we could not do much in predicting head to head outcomes. To 
continue this research, we started exploring publicly available data from other 
sports competitions. 
Our early observation lead us to notice that teams work to improve some 
trackable features in the game. For example, in soccer, a team may try to maxi-mize 
its ball possession time or possession percentage and minimize the amount
708 A. S. Randrianasolo and L. D. Pyeatt 
of red cards that its players receive. In basketball, for example, a team may 
try to minimize its turnover rate and maximize its three-points percentage. The 
teams’ statistics data can be represented in a vector format. 
This observation lead us to begin considering the idea of an ideal team. This 
ideal team has the statistics that all teams, in a particular sport of interest, try 
to reach. The values for the features in the ideal team’s vector can be hard to 
reach for some teams. These values may even be impossible, but they should 
represent what a perfect team should look like in the sport of interest. Now 
that we have teams vectors and an ideal team vector, we can start working on 
predictions. 
Fig. 1. Similarity calculation. 
The prediction is done simply by computing the similarity of each team to 
the ideal team. A simple illustration of this idea is expressed in Fig. 1. Since the 
data are vectorized, a distance or similarity calculation is not hard to compute, 
and there are several distance measures that could be used. Given two teams 
that are due to play in a head to head game, we predict that the nearest one to 
the ideal team, represented by the ideal vector, will win the game. 
3.1 Early Testing and Results 
We started testing our approach on three competitions in 2016. The test com-petitions 
were, the 2016 U.S. Open (tennis), the 2016 FIBA Africa Under 18 
(basketball), and the 2016 UEFA European Championship (soccer) also known 
as “euro 2016”. The 2016 FIBA Africa Under 18 was the ideal setup to test our 
approach. The teams in that competition did not appear to have much infor-mation 
about each other, and somehow had to utilize the statistics about the 
other teams in order to know their winning chances and to create strategies. 
The drawback of using this particular basketball competition was that it was 
not a well known competition. We were not be able to compare our predictions
Predicting Head-to-Head Games 709 
to other live predictions. This was the reason why we tested our approach to the 
2016 U.S. Open and the 2016 UEFA European Championship competitions. 
In the 2016 U.S. Open, we used the data from rounds one through four to 
predict the quarter?nals. Then, we utilized the data from rounds one through 
four plus the quarter?nals to predict the semi?nals. Finally, we employed the 
data from rounds one through four plus the quarter?nals and the semi?nals to 
predict the ?nals. 
In the 2016 FIBA Africa Under 18, we used the data from the group stage 
to predict the quarter?nals. Then, we followed the same procedure as in the 
2016 U.S. Open. In the 2016 UEFA European Championship, we also utilized 
the data from the group stage to predict the round of 16 and then we followed 
the same approach as in the previous two sports mentioned above. The features 
used during this early testing are shown in Table 1. 
Table 1. Features used in early testing 
2016 U.S. Open 2016 FIBA Africa 2016 UEFA Euro 
sets played 
tie breaks played 
total games 
total aces 
total double faults 
1st serves in % 
1st serve points won % 
2nd Serve points 
won % 
return games won 
winners 
unforced errors 
points per game 
?eld goal attempts 
?eld goal % 
3-points attempts 
3-points % 
free throw attempts 
free throw attempts % 
total corner for 
total corner against 
o?side 
fouls committed 
fouls su?ered 
yellow cards 
red cards 
pass completed 
ball possession % 
total attempt 
attempt on target 
attempt o? target 
attempt blocked 
attempt against wood 
work 
total goals 
total goals against 
There was no speci?c study done in choosing the predictors during the early 
observation part of this research. We used our knowledge about these three 
di?erent sports in choosing those predictors. We also used our knowledge about 
these sports in selecting the ideal vectors. An in-depth study on how to pick the 
predictors was left to the next phase of this research, which is summarized in 
the next section.
710 A. S. Randrianasolo and L. D. Pyeatt 
The ideal vector for the 2016 U.S. Open Men’s competition was: 
3, 0, 18, 20, 0, 100, 100, 100, 9, 80, 0 
. 
The ideal vector for the 2016 U.S. Open Women’s competition was: 
2, 0, 12, 20, 0, 100, 100, 100, 6, 80, 0 
. 
The ideal vector for the 2016 FIBA Africa Under 18 was: 
150, 150, 80, 50, 50, 50, 80 . 
The ideal vector for the 2016 UEFA European Championship was: 
100, 0, 0, 0, 100, 0, 0, 100, 100, 200, 200, 0, 0, 0, 60, 0 
. 
In our early exploration, we used three di?erent similarity or distance mea-sures: 
Cosine distance, Manhattan distance (L1 -norm), and Euclidean distance 
(L2 -norm). The prediction accuracy from 0 to 1 (0% to 100%) of each of these 
three di?erent distance metrics are captured in Fig. 2. We compared our pre-dictions 
to the predictions from Microsoft’s Bing Predicts. The results of this 
comparison are shown in Fig. 3. 
Fig. 2. Comparison of similarity measures.
Predicting Head-to-Head Games 711 
Fig. 3. Comparison with Bing.com. 
4 Prediction Method 
4.1 Choosing a Similarity Metric 
Our early exploration seems to indicate that switching similarity metric based 
on the sport event is possibly the way to proceed. However, we want to create 
a general approach that will work in any type of sports. We locked our choice 
to using Cosine distance as our similarity metric for the rest of this research. 
The reasoning for this choice is that out of the combined predictions (U.S. Open 
Men + U.S. Open Women + 2016 UEFA European Championship) recorded 
in Fig. 3, the accuracy for Cosine distance was 
18 
30 
which was similar to the 
Manhattan distance’s accuracy, while the combined accuracy for the Euclidean 
distance was 
17 
30 
. We did not break the tie between Cosine and Manhattan; we 
just picked one to go with. 
4.2 E?ect of the Ideal Vector 
Our early observation has also pointed out that a change in the ideal vector will 
a?ect the predictions. In the early observation, we used our personal knowledge 
about the sports that we were dealing with to set up the ideal vectors. We do not 
claim to be an expert in these sports or the competitions that we dealt with in 
the early observation, and the ideal vectors that we picked could be erroneous. 
Also, we want the ideal vector to be in close relationship with the trend in the 
tournament. In one tournament, for example, a ball possession of 60% could be
712 A. S. Randrianasolo and L. D. Pyeatt 
enough to win the tournament. While in another tournament, a ball possession 
of 80% may be needed to win. This prompted us to employ an optimization 
strategy to improve the ideal vector. 
4.3 Approach 
Our approach is summarized by Fig. 4. It starts with an input ?le containing the 
statistics from the early rounds of the tournament and a starting ideal vector 
that we manually selected based on what we think an ideal statistics should look 
like for an ideal team. The approach, then, makes its ?rst set of predictions based 
on the next set of games that are to played in the tournament. The predictions 
are compared to the observed outcomes to obtain the accuracy of the ideal 
vector. Next, a genetic algorithm is called to optimize the ideal vector [8–10]. 
The genetic algorithm utilizes the same input ?le containing the team statistics 
and the observed outcomes to calculate the ?tness of each candidate ideal vector. 
The best ideal vector is saved for the next set of predictions. 
Fig. 4. The overall approach. 
For the second set of predictions, the approach utilizes the best ideal vector 
produced by the genetic algorithm in the ?rst optimization and the team statis-tics 
from the beginning of the tournament up to the most recent games. For 
the third set of predictions, the approach uses the best ideal vector produced by
Predicting Head-to-Head Games 713 
the genetic algorithm in the second optimization and the team statistics from 
the beginning of the tournament up to the most recent games. The approach 
continues in this manner until the approach produces the last set of predictions, 
after which no further optimization is required. 
As a tournament moves from one round to the next, there are usually fewer 
games to predict. This means that the accuracy of any prediction methods can 
potentially go down from one round to the next as the tournament progress. This 
is another reason why we use a genetic algorithm optimization between rounds 
so that the approach can learn the trend or the pattern from the previous rounds 
to better predict the next round. 
4.4 Short Introduction to Genetic Algorithms 
A genetic algorithm is a search and an optimization process inspired from biology. 
It is based on the survival of the ?ttest. In a genetic algorithm, a potential 
solution is called an individual. An individual is, most of time, expressed as a 
string of characters. The set of individuals is known as a population. 
Each individual in the population has a ?tness value. This value indicates 
the individual’s quality of being a solution to the problem. Individuals in the 
population are allowed to mate to produce new solutions. 
The mating part of the algorithm is known as a crossover. During a crossover, 
two individuals exchange characters to form a new string. Individuals that par-ticipate 
in crossovers are selected by a process that is based on their ?tness. 
The more ?t individuals have higher chances to participate in crossovers. The 
eventual exchange of characters is governed by a crossover probability. This 
probability determines whether the exchange is allowed to happen or not. 
Individuals in the population can also mutate with a de?ned probability 
known as the mutation probability. The mutation is usually performed by alter-ing 
one or more characters from the string that represents an individual. 
In each iteration, the algorithm attempts to create new individuals. The 
algorithm halts when an individual with the desired ?tness is generated, or when 
the maximum number of allowed iterations is reached. Other halting conditions 
can also be adopted. 
4.5 Genetic Algorithm Set up 
The individuals in the population are candidate ideal vectors. The population 
size is ?xed to 100 for our experiments, and the probability of crossover is set 
to 60%. A roulette wheel selection approach is used to select the parents for the 
crossover. Other selection approaches exist and we plan to study those more in 
our future work. The crossover is performed at a ?xed point which is always at 
the middle of the candidate ideal vectors. The probability of mutation is 0.1%. 
The mutation is performed by either adding 1, with a probability of 50%, or 
subtracting 1, with a probability of 50%, to each of the values of a candidate 
ideal vector that have range greater than or equal to 5. It is performed by
714 A. S. Randrianasolo and L. D. Pyeatt 
adding or subtracting 0.1 with equal probability for values that have range less 
than 5. Each candidate ideal vector is used to predict the set of games that 
just happened; to which the observed outcomes are available. The ?tness of each 
candidate ideal vector is nothing else but its accuracy on the game that just 
happened. The genetic algorithm is allowed to generate 1200 new individuals 
before it stops. Then survival of the ?ttest is used to place a new individual in 
the population. Our genetic algorithm approach was modeled after the approach 
described by Goldberg [9]. 
5 Testing and Results 
We revisited the competitions in the early observation with this new proposed 
approach. The results are captured by Fig. 5. Since there is some randomness 
in generating the population in the genetic algorithm, we ran the approach 51 
times on each set of games that it tried to predict. We then used a majority rule 
between any two teams going head to head to see which one was mentioned the 
most to be the winner in the 51 prediction attempts. We choose 51, which is an 
odd number, because we are interested in a win or lose situation and not a draw. 
There seems to be an improvement in predicting the men’s U.S. Open tour-nament 
and a slight improvement on the 2016 UEFA European Championship, 
so we tested the approach with two other tournaments: the 2016–2017 UEFA 
Champions League and the 2017 Australian Open. Before proceeding to use the 
approach, we ran a correlation analysis on the predictor variables to help us in 
choosing the features for the ideal vectors and the vectors for each team. Figure 6 
has the correlation plot for the 2016–2017 UEFA Champions League competition 
and Fig. 7 has the correlation plot for the 2017 Australian Open competition. 
Table 2 shows the ?nal features for the team vectors and the ideal vectors that 
were used in the testing. 
Tables 3 and 4 show the ranges of the possible values for each feature in the 
ideal vectors for the two competitions. The starting ideal vector for the 2016– 
2027 UEFA Champions League was: 
60, 0, 200, 0, 0, 0, 100, 100, 100, 100, 0, 100, 0, 0 
. 
The starting ideal vector for the 2017 Australian Open Men’s competition was: 
1, 80, 1, 90, 100, 30, 1, 100, 100 
. 
The starting ideal vector for the 2017 Australian Open Women’s competition 
was: 
0, 80, 0, 80, 100, 20, 0, 100, 100 
. 
The accuracy of the predictions can be seen in Fig. 8. 
Over the eleven competitions that we have been predicting so far, we also 
tracked how this approach performed as it moved from the ?rst round of the pre-dictions 
to the next rounds. Some competitions had more rounds than the others; 
however they all had at least three rounds. The accuracy of the predictions from 
the ?rst three rounds are summarized in Fig. 9.
Predicting Head-to-Head Games 715 
Fig. 5. Revisit of the early observations. 
Fig. 6. Correlation for the 2016–2017 UEFA Champions League.
716 A. S. Randrianasolo and L. D. Pyeatt 
Fig. 7. Correlation for the 2017 Australian Open. 
Table 2. Features used in the testing. 
2016-2017 UEFA 2017 Australian Open 
total goals 
total goal against 
attempt on target 
attempt o? target 
attempt blocked 
attempts against wood work 
pass completion percentage 
ball possession 
total corner for 
cross completion 
fouls committed 
fouls su?ered 
yellow cards 
red cards 
tie break 
winners 
unforced errors 
service points won 
percentage of ?rst serve in 
aces 
double fault 
percentage of 1st serve point won 
percentage of 2nd serve point won
Predicting Head-to-Head Games 717 
Table 3. Range of values in the Ideal Vector for the 2016–2017 UEFA Champions 
League. 
Total goals 0–40 
Total goal against 0–40 
Attempt on target 10–90 
Attempt o? target 10–90 
Attempt blocked 5–60 
Attempts against wood work 0–10 
Pass completion percentage 50–100 
Ball possession 30–70 
Total corner for 0–10 
Cross completion 5–90 
Fouls committed 50–200 
Fouls su?ered 50–200 
Yellow cards 5–30 
Red cards 0–3 
Table 4. Range of values in the Ideal Vector for the 2017 Australian Open. 
Men Women 
Tie break 0–2 0–1 
Winners 90–100 0–50 
Unforced errors 0–2 0–50 
Service points won 90–100 0–70 
Percentage of ?rst serve in 90–100 50–100 
Aces 20–40 0–20 
Double fault 0–2 0–10 
Percentage of 1st serve pt. won 90–100 50–100 
Percentage of 2nd serve pt. won 90–100 50–100
718 A. S. Randrianasolo and L. D. Pyeatt 
Fig. 8. Performance on the 2016–2017 UEFA Champions League and the 2017 Aus-tralian 
Open. 
Fig. 9. Performance from one round to the next.
Predicting Head-to-Head Games 719 
6 Conclusion and Future Works 
In this paper, we have summarized our approach on predicting head to head 
games using only the statistical data of what the teams have been doing in a 
tournament of interest. Our approach is aimed at predicting local or regional 
competitions where little or no historical data is available by using a simple 
similarity metric and the well known genetic algorithm. 
Individual sports are more di?cult to predict than team sports. Injuries, 
emotions, fatigue, and other factors have a greater e?ect on individuals than they 
do on teams. For individual sports, these factors must be taken in consideration 
to improve the prediction. Taking social media input (similar to what Microsoft’s 
bing.com [11] claims to be doing) or using additional data about each game, 
such as time of the day or weather or public support (similar to what was 
done by Chen and Joachims [1,2]) can be bene?cial. Even in the work by Chen 
and Joachims [1,2], predictions are still only around 60% and 70% in tennis. 
Team performances in collective sports appear to have more regularity, making 
predictions a little bit less di?cult than individual sports. 
The performance of our approach on the 2016 FIBA Africa Under 18, the 2016 
UEFA European Championship, and the 2016–2017 UEFA Champions League, 
indicates that it has the potential to do well for predicting the outcomes of 
team and collective sports head to head games. We plan to test this approach 
on more team sports in the future. Our future goals also include ?nding a way 
to automatically infer the initial ideal vectors from the initial data rather than 
depending on a human agent to generate them. We also plan to engage in a 
more detailed analysis of the parameters involved in the genetic algorithm. This 
will involve exploring di?erent selection approaches and experimenting with the 
crossover and the mutation probability. Our aim in this endeavor is not only 
to improve the accuracy but also to uncover the reason for the slight drop of 
performance between the second rounds and the third rounds of predictions as 
we can see from Fig. 9. We also plan to compare our predictions to ranking based 
and point based predictions. 
References 
1. Chen, S., Joachims, T.: Predicting matchups and preferences in context. In: Pro-ceedings 
of the 22nd ACM SIGKDD International Conference on Knowledge Dis-covery 
and Data Mining, KDD 2016, San Francisco, California, USA, pp. 775–784. 
ACM, New York (2016) 
2. Chen, S., Joachims, T.: Modeling intransitivity in matchup and comparison data. 
In: Proceedings of the Ninth ACM International Conference on Web Search and 
Data Mining, WSDM 2016, San Francisco, California, USA, pp. 227–236. ACM, 
New York (2016) 
3. Pretorius, A., Parry, D.A.: Human decision making and arti?cial intelligence: a 
comparison in the domain of sports prediction. In: Proceedings of the Annual 
Conference of the South African Institute of Computer Scientists and Information 
Technologists, SAICSIT 2016, Johannesburg, South Africa, pp. 32:1–32:10. ACM, 
New York (2016)
720 A. S. Randrianasolo and L. D. Pyeatt 
4. Soares, C., Gilbert, J.E.: Predicting cross-country results using feature selec-tion 
and evolutionary computation. In: The Fifth Richard Tapia Celebration of 
Diversity in Computing Conference: Intellect, Initiatives, Insight, and Innovations, 
TAPIA 2009, Portland, Oregon, pp. 41–45. ACM, New York (2009) 
5. Brooks, J., Kerr, M., Guttag, J.: Developing a data-driven player ranking in soccer 
using predictive model weights. In: Proceedings of the 22nd ACM SIGKDD Inter-national 
Conference on Knowledge Discovery and Data Mining, KDD 2016, San 
Francisco, California, USA, pp. 49–55. ACM, New York (2016) 
6. Vaz de Melo, P.O.S., Almeida, V.A.F., Loureiro, A.A.F.: Can complex network 
metrics predict the behavior of NBA teams? In: Proceedings of the 14th ACM 
SIGKDD International Conference on Knowledge Discovery and Data Mining, 
KDD 2008, Las Vegas, Nevada, USA, pp. 695–703. ACM, New York (2008) 
7. Vaz de Melo, P.O.S., Almeida, V.A.F., Loureiro, A.A.F., Faloutsos, C.: Forecasting 
in the NBA and other team sports: network e?ects in action. ACM Trans. Knowl. 
Discov. Data 6, 13:1–13:27 (2012) 
8. Mitchell, M., Forrest, S.: Genetic algorithms and arti?cial life. Artif. Life 1, 267– 
289 (1994) 
9. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learn-ing. 
Addison-Wesley Longman Publishing Co. Inc., Boston (1989) 
10. Holland, J.H.: Adaptation in Natural and Arti?cial Systems: An Introductory 
Analysis with Applications to Biology, Control and Arti?cial Intelligence. MIT 
Press, Cambridge (1992) 
11. Bing Predicts. http://www.bing.com/explore/predicts. Accessed 17 July 2017
Arti?cial Human Swarms Outperform Vegas 
Betting Markets 
Louis Rosenberg(?) 
and Gregg Willcox 
Unanimous AI, San Luis Obispo, CA, USA 
Louis@Unanimous.AI 
Abstract. Swarm Intelligence (SI) is a natural phenomenon in which biological 
groups amplify their collective intelligence by forming dynamic systems. It has 
been studied extensively in bird ?ocks, ?sh schools, and bee swarms. In recent 
years, AI technologies have enabled networked human groups to form systems 
modeled on natural swarms. Referred to as Arti?cial Swarm Intelligence or ASI, 
this approach has been shown to signi?cantly amplify the e?ective intelligence 
of human groups. The present study compares the predictive ability of ASI to 
Vegas betting markets when forecasting sporting events. Groups of average sports 
fans were required to forecast the outcome of 200 hockey games in the NHL 
league (10 games per week for 20 weeks). The expected win rate for Vegas 
favorites was 62% across the 200 games based on the published odds. The ASI 
system achieved a win rate of 85%. The probability that the ASI system outper- 
formed Vegas by chance was very low (p = 0.006), indicating a signi?cant result. 
Researchers also compared the ROI generated from two betting models: one that 
wagered weekly on the top Vegas favorite, and one that wagered weekly on the 
top ASI favorite. At the end of the 20-week period, the Vegas model generated a 
41% ?nancial loss, while the ASI model generated a 170% gain. 
Keywords: Swarm intelligence · Arti?cial intelligence Collective intelligence 
1 Background 
Arti?cial Swarm Intelligence (ASI) is a powerful method for amplifying the predictive 
accuracy of networked human groups [1, 2]. A variety of prior studies, across a wide 
range of prediction tasks have demonstrated that real-time “human swarms” can produce 
more accurate forecasts than traditional “Wisdom of Crowds” methods such as votes, 
polls, and surveys [3]. For example, a study in 2015 tested the ability of human swarms 
to predict the outcome of college football games. The ASI system tapped the real-time 
intelligence of 75 amateur sports fans to predict 10 bowl games. As individuals, the 
participants averaged 50% accuracy when predicting outcomes against the spread. When 
forecasting together as a real-time ASI system, those same participants achieved 70% 
accuracy against the spread [2]. Similar increases have been found in other studies, 
including a ?ve-week study that tasked human participants, connected as an ASI system, 
with predicting a set of 50 soccer matches in the English Premier League. Results showed 
a 31% increase in accuracy when participants were connected in ASI swarms as 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 721–729, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_54
compared to forecasting as individuals [4]. The human swarms also outperformed the 
BBC’s machine-model known as “SAM” over those same 50 games [5]. 
Although previous research has shown that ASI technology can empower human 
groups to outperform individual forecasters as well as traditional crowd-based methods, 
no formal study has been conducted to compare the predictive ability of ASI to major 
betting markets [6]. To address this need, the current study was conducted to rigorously 
compare “human swarms” to Vegas betting markets, assessing the accuracy rates and 
the ?nancial returns across a large set of predictions. Speci?cally, this largescale study 
required groups of sports fans to forecast the outcome of 200 games in the National 
Hockey League (NHL), structured as 10 games per week for 20 consecutive weeks. 
1.1 From Crowds to Swarms 
When collecting input from human groups, the phase “Wisdom of Crowds” is generally 
used whenever the input is aggregated to generate output of higher accuracy. [7–9]. The 
basic premise, also referred to as Collective Intelligence, dates to the early 1900’s and 
generally involves collecting survey data from groups of individuals and computing a 
statistical result. When comparing “swarms” and “crowds”, the primary di?erence is 
that in crowd-based systems, the participants provide isolated input that is aggregated 
in external statistical models, whereas in swarm-based systems the participants interact 
in real-time, “thinking together” as a uni?ed system. In other words, crowds are statis- 
tical constructs while swarms are closed-loop systems in which the participants act, 
react, and interact in real-time, converging together on optimized solutions. 
ASI systems are generally modeled on biological systems such as ?sh schools, bird 
?ocks, and bee swarms. The present study uses Swarm AI technology from the company 
Unanimous AI. This technology is modeled primarily on the collective decision-making 
processes employed by honeybee swarms [4]. This framework was chosen because 
honeybee populations have been shown to reach optimal decisions by forming real-time 
closed-loop systems [10]. In fact, at a structural level, the decision-making methods 
observed in honeybee swarms are very similar to the decision-making processes 
observed in neurological brains [11, 12]. 
When reaching decisions, swarm and brains are both employ large populations of 
simple excitable units (i.e., bees and neurons) that operate in parallel to (a) integrate 
noisy data about the world, (b) weigh competing alternatives when a decision needs to 
be made, and (c) converge on preferred decisions as a uni?ed system. In both brains and 
swarms, outcomes are arrived upon through competition among sub-populations of 
simple excitable units. When one sub-population exceeds a threshold level of support, 
the corresponding alternative is chosen by the system. In honeybees, this enables the 
group to converge on optimal decisions across a wide range of tasks, for example when 
selecting the best possible hive location from a large set of options. Researchers have 
shown that honey bees converge on the best possible solution to this life-or-death deci- 
sion approximately 80% of the time [13, 14]. 
722 L. Rosenberg and G. Willcox
1.2 Creating Human Swarms 
Unlike birds and bees and ?sh, humans have not evolved the natural ability to swarm, 
as we don’t possess the subtle skills that other organisms use to establish high speed 
feedback-loops among their members. Fish for example, when moving in schools, detect 
faint vibrations in the water around them. Birds, when ?ocking, detect subtle motions 
propagating through the formation. Honeybees, when reaching decisions as a uni?ed 
swarm, use complex body vibrations called a “waggle dance” to encode their changing 
views. To enable real-time swarming among groups of networked humans, specialized 
software is required to close the loop among all members. To solve this problem, a 
software platform (swarm.ai) was created to allow human groups to form real-time 
systems from anywhere in the world [1, 6]. Modeled after the decision-making process 
of honeybee swarms, swarm.ai enables groups of networked users to work in parallel to 
(a) integrate noisy information, (b) weigh competing alternatives when making deci- 
sions, and (c) converge on decisions, together as a real-time closed-loop system. 
As shown in Fig. 1 below, arti?cial swarms answer questions by moving a graphical 
puck to select among a set of answer options. Each participant provides their input by 
moving a graphical magnet with a mouse, touchpad, or touchscreen. By adjusting their 
magnet in relation to the moving puck, real-time participants can express their individual 
intent on the system as a whole. The input from each user is not a vote, but a continuous 
stream of vectors that varies freely over time. Because all members of the networked 
population can vary their intent continuously in real-time, as moderated by AI algo- 
rithms, the arti?cial swarm explores the decision-space, not based on the input of any 
single individual, but based on the emergent dynamics of the system as a whole. This 
enables complex deliberations to emerge among all participants at the same time, 
empowering the group to collectively consider each of the options and converge on the 
solution that best represents their combined knowledge, wisdom, and insights. 
Fig. 1. Real-time ASI choosing between options. 
Arti?cial Human Swarms Outperform Vegas Betting Markets 723
It is critical point out that participants do not only vary the direction of their individual 
intent, but also modulate the magnitude by manipulating the distance between their 
magnet and the puck. Because the puck is in ?uid motion throughout the decision-space, 
users need to continuously update the position and orientation of their magnet so that it 
stays close to the puck’s outer rim. This is important, for it requires participants to remain 
engaged throughout the decision-making process, continuously evaluating and re-eval- 
uating their individual thoughts and feelings with respect to the question at hand. If they 
stop moving their magnet in relation to the changing position of the puck, the distance 
grows and their applied sentiment wanes. 
2 Forecasting Study 
To quantify the forecasting ability human swarms as compared to large Vegas betting 
markets, a 20-week study was conducted using randomly selected human subjects. The 
participants, who were self-reported sports fans, were split into weekly groups. Each 
group consisted of 25 to 35 participants, all of whom logged in remotely to the swarm.ai 
system. Human subjects were paid $3.00 for their participation in each weekly session, 
which required them to forecast the outcome of all ten hockey games being played that 
night. All subjects were required to make their forecasts in two ways – (a) as individuals 
reporting on a standard online survey, and (b) as a contributor to a real-time ASI system. 
For each hockey game, participants were tasked with forecasting the winner and the 
margin of victory, expressed as either (a) the team win by 1 goal, or (b) the team win 
by 2 or more goals. The margins were chosen to match common Vegas gambling 
spreads. Figure 2 below shows a snapshot of a human swarm comprised of 31 partici- 
pants in the process of predicting a match between Toronto and Calgary. 
Fig. 2. ASI in the process of forecasting an NHL game. 
724 L. Rosenberg and G. Willcox
As shown in Fig. 2, each real-time swarm is tasked with selecting from among four 
outcome options, indicating which team will win and which margin is most likely. 
Again, the participants do not cast discrete votes but express their intent continuously 
over time, converging together as a system. The image shown in Fig. 2 is a snapshot of 
the system as it moves across the decision-space and converges upon an answer, a 
process that generally required between 10 and 60 s to complete. 
In addition to forecasting each individual game, participants were asked to identify 
which of the weekly predictions is the most likely to be a correct assessment. In other 
words, which of the teams forecast to win their games that week should be deemed the 
“pick of the week” as a consequence of being the most likely team to win its game. 
Figure 3 shown below is an example of ASI system in the process of identifying the pick 
of the week. As shown, the system is selecting from among six possible teams to decide 
which is most likely to win its game that week. 
Fig. 3. ASI in process of identifying “Pick of the Week”. 
2.1 Wagering Protocol 
By collecting predictions for each of the 10 weekly games as well as a top “pick of the 
week”, forecasting data was collected across all 20 weeks for accuracy comparison 
against Vegas betting markets. To enable ROI comparisons against betting markets, two 
standardized betting models were tracked across the 20-week period. In both models, 
an initial simulated betting pool of $100 was created as the starting point for ROI 
computations, the pools tracked over the 20-week period. 
Arti?cial Human Swarms Outperform Vegas Betting Markets 725
In “Wagering Model A,” a simple heuristic was de?ned which allocated weekly bets 
equal to 15% of the current betting pool, dividing it equally across all ten weekly fore- 
casts made by the ASI system. In “Wagering Model B,” a similar heuristic was de?ned 
which also allocated 15% of the current betting pool for use in weekly bets, but placed 
the entire 15% upon one game, identi?ed as “pick of the week”. Both pots were tracked 
over the 20-week period, using actual Vegas payouts to compute returns. Vegas odds 
used in this study were captured from www.sportsbook.ag, a popular online betting 
market. 
3 Results 
Across the set of 200 games forecast by the ASI system, an accuracy rate of 61% was 
achieved. This compares favorably to the expected accuracy of 55% based on Vegas 
odds (p = 0.0665). Of course, the more important skill in forecasting sporting events is 
identifying which games can be predicted with high con?dence as compared to those 
games which are too close to call. This skill is re?ected in the “pick of the week” gener- 
ated by the ASI system. Across the 20 weeks, the system achieved 85% accuracy in 
correctly predicting the winner of the “pick of the week” game. This compares very 
favorably to the expected accuracy of 62% based on Vegas odds. 
Figure 4 below shows the distribution of Vegas Odds for the twenty selected “pick 
of the week” games. As described above, the swarm-based system had a win rate of 85% 
across these same games. This is a signi?cant improvement, equivalent to reducing the 
error in Vegas Odds by 61%. The probability that the swarm outperformed Vegas Odds 
by chance was extremely low (p = 0.0057), indicating a highly signi?cant result. 
Fig. 4. Results across 20 weeks of NHL predictions. 
726 L. Rosenberg and G. Willcox
In addition, a betting simulation was run for each prediction set in which 15% of the 
current bankroll was bet on each weekly prediction. The performance of this model, 
when betting against Vegas is shown below in Fig. 5. Starting with $100 and investing 
each week according to this strategy, the Pick of the Week strategy results in a gain of 
$270.20, equivalent to a 20-week ROI of 170%, and a week-over-week average ROI of 
5.09%. For comparison, betting on all of the swarm’s picks evenly (for a total of 15% 
of the bankroll) results in $121.82, or a 20-week ROI of 21.8%, indicating that the swarm 
is selecting better than randomly among its picks. 
Fig. 5. Cumulative betting performance across 20 weeks. 
While it’s impressive to achieve 170% ROI over 20 weeks, we can gain additional 
insight into the signi?cance of this outcome by comparing against additional baselines. 
For example, we can compare these results to (a) randomly placed bets across all games 
played as a means of assessing if the swarm bets across all games are as signi?cant as 
they appear, and (b) bets placed on the Vegas favorite each week as a means of assessing 
if betting on the swarm’s top picks is as impressive as it seems. 
These baselines are shown in Fig. 6 as the green line and red line, respectively. 
Looking ?rst at random betting across all games, the net outcome across 20 weeks was 
$72.39, which equates to 28% loss over the test period. This is signi?cantly worse than 
the $122 (22% gain) achieved by betting on all swarm-based forecasts. Even more 
surprising, betting on the Vegas favorites each week resulted in a net outcome of $59, 
which equates to a 41% loss over the 20-week test period. This is signi?cantly worse 
than the $270 (170% gain) achieved by betting on the swarm’s top picks. 
Arti?cial Human Swarms Outperform Vegas Betting Markets 727
Fig. 6. Swarm performance vs Baseline performance across 20 weeks. 
4 Conclusions 
Can real-time human swarms, comprised of average sports fans connected by swarming 
algorithms, outperform the predictive abilities of largescale betting markets? The results 
of this study suggest this is very much the case. As demonstrated across a set of 200 
games during the 2017–2018 NHL hockey season, an ASI systems comprised of approx- 
imately 30 typical sports fans, were able to out-forecast Vegas betting markets. This was 
most signi?cant when the ASI system identi?ed a “pick of the week” as the most likely 
game to achieve the predicted outcome. Across the 20 weeks, the system achieved 85% 
accuracy when predicting the “pick of the week” games, which compares favorably to 
the expected accuracy of 62% based on Vegas odds. The probability that the system 
outperformed Vegas by chance was extremely low (p = 0.006), indicating a highly 
signi?cant result. 
In addition, when using the “pick of the week” within a simple automated wagering 
heuristic, a simulated betting pool that started at $100, grew to $270 over the 20-week 
period based on the swarm-based predictions. This was a 170% ROI. Additional work 
is being conducted to optimize this wagering heuristic, as there appears to be room for 
improvement when optimizing Vegas wagers based on a swarm-based predictive intel- 
ligence. Looking towards future research, additional studies are planned to better under- 
stand which types of problems are best suited for solutions using “human swarms” as 
well as the impact of swarm size on output accuracy. 
References 
1. Rosenberg, L.: Human swarms, a real-time method for collective intelligence. In: Proceedings 
of the European Conference on Arti?cial Life 2015, pp. 658–659 
2. Rosenberg, L.: Arti?cial swarm intelligence vs human experts. In: 2016 International Joint 
Conference on Neural Networks (IJCNN). IEEE 
728 L. Rosenberg and G. Willcox
3. Rosenberg, L., Baltaxe, D., Pescetelli, N.: Crowds vs Swarms, a Comparison of Intelligence. 
In: IEEE 2016 Swarm/Human Blended Intelligence (SHBI), Cleveland, OH (2016) 
4. Baltaxe, D., Rosenberg, L., Pescetelli, N.: Amplifying prediction accuracy using human 
swarms. In: Collective Intelligence 2017, New York, NY (2017) 
5. McHale, I.: Sports Analytics Machine (SAM) as reported by BBC. http://blogs.salford.ac.uk/ 
business-school/sports-analytics-machine/ 
6. Rosenberg, L., Willcox, G.: Arti?cial Swarms ?nd Social Optima. In: 2018 IEEE Conference 
on Cognitive and Computational Aspects of Situation Management (CogSIMA 2018) – 
Boston, MA (2018) 
7. Bonabeau, E.: Decisions 2.0: The power of collective intelligence. MIT Sloan Manag. Rev. 
50(2), 45 (2009) 
8. Woolley, A.W., Chabris, C.F., Pentland, A., Hashmi, N., Malone, T.W.: Evidence for a 
collective intelligence factor in the performance of human groups. Science 330(6004), 686– 
688 (2010) 
9. Surowiecki, J. The wisdom of crowds. Anchor (2005) 
10. Seeley, T.D., Buhrman, S.C.: Nest-site selection in honey bees: how well do swarms 
implement the ‘best-of-N’ decision rule? Behav. Ecol. Sociobiol. 49, 416–427 (2001) 
11. Marshall, J., Bogacz, R., Dornhaus, A., Planqué, R., Kovacs, T., Franks, N.: On optimal 
decision-making in brains and social insect colonies. Soc. Interface (2009) 
12. Seeley, T.D., et al.: Stop signals provide cross inhibition in collective decision-making by 
honeybee swarms. Science 335(6064), 108–111 (2012) 
13. Seeley, T.D.: Honeybee Democracy. Princeton University Press, Princeton (2010) 
14. Seeley, T.D., Visscher, P.K.: Choosing a home: how the scouts in a honey bee swarm perceive 
the completion of their group decision making. Behav. Ecol. Sociobiol. 54(5), 511–520 
Arti?cial Human Swarms Outperform Vegas Betting Markets 729
Genetic Algorithm Based on Enhanced 
Selection and Log-Scaled Mutation 
Technique 
Neeraj Gupta1(B) , Nilesh Patel1 , Bhupendra Nath Tiwari2 , 
and Mahdi Khosravy3 
1 
Department of Computer Science and Engineering, Oakland University, 
Rochester, MI, USA 
{neerajgupta,npatel}@oakland.edu 
2 
INFN-Laboratori Nazionali di Frascati, Via. E. Fermi, 40 – I – 00044, 
Frascati, Rome, Italy 
bhupendray2.tiwari.phd@iitkalumni.org 
3 
Department of Electrical and Electronics Engineering, 
Fedral University of Juiz de Fora, Juiz de Fora, Brazil 
mahdi.khosravy@ufjf.edu.br 
Abstract. In this paper, we introduce the selection and mutation 
schemes to enhance the computational power of Genetic Algorithm (GA) 
for global optimization of multi-modal problems. Proposed operators 
make the GA an e?cient optimizer in comparison of other variants of GA 
with improved precision, consistency and diversity. Due to the presented 
selection and mutation schemes improved GA, as named Enhanced Selec-tion 
and Log-scaled Mutation GA (ESALOGA), selects the best chro-mosomes 
from a pool of parents and children after crossover. Indeed, 
the proposed GA algorithm is adaptive due to the log-scaled mutation 
scheme, which corresponds to the ?tness of current population at each 
stage of its execution. Our proposal is further supported via the sim-ulation 
and comparative analysis with standard GA (SGA) and other 
variants of GA for a class of multi-variable objective functions. Addi-tionally, 
comparative results with other optimizers such as Probabilistic 
Bee Algorithm (PBA), Invasive Weed Optimizer (IWO), and Shu?ed 
Frog Leap Algorithm (SFLA) are presented on higher number of vari-ables 
to show the e?ectiveness of ESALOGA. 
Keywords: Selection operator 
·
Mutation operator 
Log-scaled mutation 
·
Diversity preservation 
·
Genetic algorithms 
Metropolis algorithm 
1 Introduction 
Rapid industrial growth and utilization of the available resources e?ciently are 
of the prime importance nowadays, for example, route identi?cation in tra?c 
systems, optimization of process allocation in maximizing production, utilization 
n 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 730–748, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_55
Advances in Genetic Algorithm 731 
of energy resources in power systems, optimizing VLSI circuits design, CAN 
optimization in vehicles, etc. [1–12]. Most of the industrial problems are complex 
in nature and belong to the combinatorial optimization, where the main focus 
it to optimize discrete variables for maximizing/minimizing required objectives 
[1,2]. Although, two methods are available to solve this type of problems, which 
are integer programming approach and dynamic programming. These traditional 
methods are known as exact algorithms [3,5]. However, due to the computational 
complexity, where a fast solution is required for huge size optimization problems 
that cannot be relied on such algorithms. Optimization in this aspect may be 
critically crucial for the sustainable growth of industries in competitively winning 
and in highly uncertain economic environments [3–5]. 
Henceforth, in last two decades, as an alternative approach to solve combi-natorial 
problems, a large number of researchers have focused on approximate 
methods to solve these problems that are close to their optimal state in a reason-ably 
acceptable time. Thus, the development of heuristic algorithms in the ?eld 
of mathematics, engineering, etc. [6,7] have demonstrated a successful imple-mentation 
towards the solution of real-life problems. As a result, a considerable 
number of heuristic evolutionary algorithms have invented to work e?ciently 
on linear/ nonlinear, di?erentiable/ non-di?erentiable, concave/ convex prob-lems 
with discrete variables [6–9]. A general description of complex functions 
can be seen in [10] and their applications with discrete variables on power sys-tem 
design in [11,12], and the capacity of the energy generators, the quantity of 
goods produced, number of vehicles on the route, etc. in [13–15]. 
GA, as it works on binary variables, hardware friendly algorithms have been 
proposed in many variants to solve the combinatorial problems. The literature-survey 
shows a huge scope to further improve it by an appropriate combination 
of mathematical modeling along with the heuristic concept [9]. GA and its asso-ciated 
variants have been proved to give globally optimal solutions, especially for 
the multi-modal non-di?erentiable/ combinatorial/ industrial problems [16–18]. 
Moreover, GA is very easy to implement and has an advantage of developing its 
operators in a simple process from the inspiration of genetical processes which 
have been rigorously investigated at a large scale during the last two decades 
[1–19]. 
As developed by John Henry Holland [20], GA is inspired from the “survival 
of the ?ttest principle”, which mimics the natural process of evolution in terms 
of several operators as the selection, crossover, and mutation operators [20]. An 
adaptation of these operators is analyzed and modeled by a large community of 
researchers, where several of them have given evidence and they have improved 
it by introducing novel selection approaches of the ?ttest individuals, types of 
crossover variants and mutation schemes. These improved models of GA keep 
the search not to stuck in a premature convergence. In the light of GA research, 
this paper o?ers a combination of mathematical modeling and heuristic approach 
together in order to ?nd the global optimal solutions for multimodal nonlinear 
functions. It is worth mentioning that over the last few decades GA has been 
elected as a successful heuristic evolutionary technique for addressing various
732 N. Gupta et al. 
global combinational industrial problems and it has been widely used due to its 
simple structure, see for instance [9,16,21–23]. 
Regardless of the state of a?airs, GA has as powerful optimization fundamen-tals 
with a few drawbacks that can be seen in a number of readings [8,9,16,24]. 
GA convergences prematurely due to improper selection, crossover and muta-tion 
probabilities and associated criteria [25–27]. In these papers, a variant of GA 
has been described as a modi?cation of the GA model parameters, i.e., selection 
method, crossover operator, mutation operator, and undermining probabilities. 
Based on [28], elitism ensures that winner chromosomes go in the next-generation 
process that moves the search from a premature to the mature phase. This is 
exploited in Sect. 3. Hereby, in the light of Adaptive GA [29], our proposal fur-ther 
gives motivations to evolve its mutation probability based on the present 
state of all candidates by using a probabilistic modeling. 
This paper is structured in six sections. Firstly, in Sect. 2, we provide a brief 
step-by-step description of the GA algorithm, as our proposal arises its improve-ment. 
In Sect. 3, as the most important part of this paper, a brief description of 
the proposed enhanced selection scheme and log-scaled mutation operators are 
provided. Consequently, Sect. 4 presents a binary coded Enhanced Selection and 
Log-scaled mutation Genetic Algorithm (ESALOGA) that as an optimization 
package solves combinatorial problems. Section 5 presents simulated results in 
comparison to other variants of GA and three real coded optimizers concerning 
multi-modal benchmark functions. Finally, Sects. 6 and 7, respectively conclude 
the paper and give future research directions and improvements. 
2 Binary Coded GA 
A step by step operation of the binary coded GA is presented [9] that ?rstly 
allows to understand the concept of GA and symbiotic integration of di?erent 
operators such as selection, crossover and mutation operators. 
Step 1: At ?rst, the parameters of GA are initialized as the crossover and 
mutation probabilities Pc and Pm, such that Pm 
m 
Pc, chromosomes in the 
population s, number of bits l to represent one variables as to decide the 
length of chromosomes, which is nl for n variables in the chosen problem. 
Termination criteria as the maximum number of generations that GA could 
proceed is selected based on the problem size. 
Step 2: To start the evolution process, the ?tness of each chromosome is cal-culated 
in the population. In this process, a part of binary chromosome 
representing a variable is decoded to express in decimal represented as 
dn = 
.nl-1 
i=0 
2i bn 
i 
where bn 
i 
?n {0, 1} 
belongs to the nth variable. Values of nth 
variable are obtained as the bound x(
L) 
n 
=) 
xn 
=n 
x(
U) 
n , where nth variable xn is 
calculated as xn = x(
L) 
n +
x(
U) 
n 
-x(
L) 
n 
2l n-1 
dn based on its respective lower and upper 
bounds x(
L) 
n and x(
U) 
n . After converting the variable in a required domain, 
the associated objective function f(x) is calculated for all individuals repre-sented 
by the chromosome strings in the population. For the minimization
Advances in Genetic Algorithm 733 
problem, the ?tness function Fs, associated with s chromosome, is adopted 
as Fs = 
1 
1+fs(x) 
, which is the function of objective function fs(x). 
Step 3: At this point, a selection operator selects the ?ttest chromosomes as 
the candidates go for mating, based on Roulette wheel selection [9,30]. This 
is the ?rst stage of GA process, where multiple di?erent operators have been 
proposed, i.e., roulette wheel as in standard GA, tournament and uniform 
selection as a variant of GA. In the proposal, we introduce an enhanced 
selection scheme which is utilized after Step-4 instead in Step-3. 
Step 4: In sequel, the crossover operator gives a number of strings from the 
mating pool using ?xed crossover probability Pc. For the selected pair of 
candidates, knows as parents, a cross-site is generated randomly in the inter-val 
(0,nl -l 
1) and swapped the selected regions between two pairs. At this 
step, di?erent crossover mechanisms have been proposed such as single-point, 
multi-point, uniform crossover, and etc. The use of di?erent crossover tech-niques 
makes the standard GA to enhance as the variant of it. 
Step 5: After above serial processes, children chromosome strings arise as the 
result and the population of which is known as an intermediate population 
as taken in [9]. At this step, we have a pool of parents and resulting their 
o?spring. Our proposal aims to answer which candidates should go to the 
next evolution phase as better parents. 
Step 6: At this juncture, bitwise mutation is carried out, where as a result of 
mutation operator, a selected bit in the chromosome is ?ipped to opposite 
binary value based on a relatively low ?xed mutation probability pm. To 
make the process adaptive, based on the current status of the population, we 
propose a log scaled-mutation technique. 
Step 7: Until the termination criterion is not reached, return to Step 2. 
3 Selection and Mutation Schemes 
In this section, we provide a step-by-step working principle of the proposed 
enhance selection and log-scaled mutation operators in order for providing an 
improved GA (ESALOGA) as a better optimization technique. 
3.1 Proposed Selection Operator 
Based on the Metropolis algorithm [31], we focus on possible improvements of the 
GA for ?nding the optimal solution in due course of the cross-over while selecting 
chromosome strings. This keeps intact a high degree of diversity in selecting 
the children which are the most suitable when the chosen parents undergo a 
cross-over. To chose appropriate candidates from the current pool of parents 
and o?spring a block diagrammed of the proposed selection strategy is given 
in Fig. 1. Mathematically, this is realized by introducing a selection probability 
as the Boltzmann probability distribution. Precisely, let T be the temperature, 
then the selection probability p(T ) reads as the Maxwellian distribution. 
p(T ) = e-?E/
kT 
, (1)
734 N. Gupta et al. 
where ?E represents change in energy between the chosen parents and children. 
With the above probability p(T ), a set of selected strings passed to the next 
stage of evolution. It is worth mentioning that the principle of elitism [9] o?ers 
the best ?tness value to the string for a given pool of parents and children. 
Following (1), the subsequent strings are selected that are the ?ttest string in 
Fig. 1. Flow diagram for selection strategy after crossover.
Advances in Genetic Algorithm 735 
the previous stage of the evolution. The proposed model is realized as per the 
following steps: 
Step 1: Choose an initial value of the temperature T as 
T = as 
M 
I 
, (2) 
where M is the maximum value of the ?tness function 
{Fs|s = 1, 2,... , 20}, 
I is the number of iterations and s labels the strings pertaining to the cross-over 
of a given population. Note that the initial value of the temperature T is 
taken as large as possible such that it decreases in the subsequent iteration to 
its desired value. Here, the proportionality constant ah is set as per the chosen 
algorithm. 
Step 2: In order to ?nd energy di?erence, one chooses jth string in a given pool 
of parents and children and subtract the corresponding ?tness value Fj of the 
jth string to a priorly selected string Ff. In other words, the energy di?erence 
that governs the probability distribution is given by 
?E = Ff 
-f 
Fj (3) 
with j = 1, 2, 3. 
Step 3: Compute p as per the equation 1 and obtain its minimum value as 
p = min(1,e-?E 
kT 
) (4) 
Step 4: Acquire a random number r 
?: 
(0, 1). 
Step 5: If a candidate string is selected, and the corresponding previously 
selected partner string is the ?ttest one, that is r < p. 
Step 6: Else, go to Step 2, and repeat the search. In the case when none of the 
strings are selected, one increases the value of mutation probability pm. In 
practical situations, we may consider the corresponding value pm = 0.1. 
Step 7: Finally, one selects the partner string chromosome by repeating Steps 
2 to 6. 
3.2 Proposed Mutation Operator 
In this subsection, we o?er log-scaled mutation strategy as given in Fig. 2 with 
the corresponding operations as below: 
Step 1: Obtain the mutation probability for a given ?tness value fs(x) as per 
the transformation ys = log10Fs. 
Step 2: For the maximum value of the ?tness Fmax s 
, de?ne ymax s 
= log10Fmax s 
. 
Step 3: Corresponding to the minimum ?tness value Fmin s 
, de?ne ymin s 
= 
log10Fmin s 
. 
Step 4: ymax s 
is mapped to the minimum mutation probability pmin m 
such that 
the best candidates remain intact. 
Step 5: ymin s 
is mapped to the maximum mutation probability pmax m 
such that 
the worst candidate mutate.
736 N. Gupta et al. 
Step 6: De?ne a linear relationship between ys and pm,s as 
pm,s = 
pmax m 
-m 
pmin m 
ymax s 
-m 
ymin s 
(ys 
-s 
ymin s 
), (5) 
where the ratio of pmax m 
-f 
pmin m 
and ymax s 
-n 
ymin s 
gives the ßh as the slop of the 
line plotted between pm,s and ys. This leads to the following linear equation: 
pm,s = ßys + ?, (6) 
where ?) is the intercept of the line as in (6) as 
?s = 
-s 
pmax m 
-s 
pmin m 
ymax s 
-s 
ymin s 
ymin s 
(7) 
With the above slop ßh and intercept ?, the mutation probability pm,s is 
obtained by the following logarithmic relation 
pm,s = ßlog10Fs + ?, (8) 
where s labels the undermining chromosome. Physically, this shows the 
inverse relation [9] between the ?tness value Fs and the mutation probability 
pm,s. 
Fig. 2. Log-scaled mutation strategy.
Advances in Genetic Algorithm 737 
Step 7: This assigns a unique mutation probability pm,s to each candidate 
strings in the range (pmin m 
,pmax m 
), viz. we have 
pmin m 
=e 
pm,s 
=m 
pmax m 
(9) 
Step 8: Finally, a diversity in the selected population is realized by a bitwise 
mutation process. 
The ?tness values of the strings are usually sparse, thus we propose a log-scale 
mutation operator. In this approach, we ?nd that all mutation probabilities 
are kept in a speci?ed range, irrespective of variations in the ?tness values. 
This makes our proposal adaptive and yields an evolution from a premature 
to mature phase of a given population. In a nutshell, we have illustrated that 
there is a non-linear relationship between mutation probability and ?tness value 
as far as evolutionary algorithms are concerned. In addition, it follows that 
the higher ?tness value leads to the lower mutation probability. This indicates 
comparatively a larger search space while ?nding the global optimal solution. 
4 Proposed GA (ESALOGA) 
Based on the proposed enhanced selection and log-scaled mutation strategies, 
we provide below pseudo-code of the algorithm. 
For a given input parameters randomly generates a binary initial population 
P with the fact that the candidate chromosome strings and mutation proba-bility 
pm is adaptively selected by enhanced selection operation (EnSelection) 
and given mutation range (pmin m 
,pmax m 
), respectively. Produce the mating pool for 
breeding for a given crossover probability pc. Extract two parents from the mat-ing 
pool using standard roulette wheel (RW) selection operator. Indeed, other 
selection schemes such as tournament and uniform could be adopted, as well, for 
better performance. Perform the single-point crossover operation to produce two 
children. Infact instead of sigle-point the use of two-points or uniform crossover 
operation may enhance the computational capability. At this junction, form a 
pool of two parents and produced their children choose two appropriate candi-date 
strings using the enhanced selection (EnSelection) operator with probability 
p(T ) as in (1). As a result of this two appropriate candidates are selected to go 
in next evolution. When no chromosomes are selected from the pool, mutate all 
the strings with an increased mutation probability pm and repeat the EnSelec-tion 
operation. After this operation we get the intermediate population which 
is subjected to the mutation operator with mutation probability pm. Following 
the log-scaled strategy (LSMut), produce population of mutated string (Pm), 
as in Algorithm 1. Taking best chromosome from the above two populations as 
shown in algorithm in line number (11). Repeate the steps until the termination 
criterion is reached.
738 N. Gupta et al. 
Algorithm 1. Pseudo-code for the proposed ESALOGA 
Require: N: the number of chromosomes, pc: crossover probability, tmax: maximum 
iterations, pm: mutation probability, pmin m : lower bound on pm, pmax m : upper bound 
on pm, b: number of bits to represent one variable, v: number of variables. 
P ?-round(rand(N,b*v)):initialize binary population randomly 
1: GP?-best of [P]: GP belongs to the best solution in current P 
2: for i 
?o 
1 to tmax do 
3: 
n?- 
1 
4: while n 
=: 
N do 
5: [Parent1, Parent2] 
?--  
Selection(P ) : RW or Tournament selection operation 
6: [Children1, Children2] 
?--  
Xover(Parent1, P arent2) : Crossover operation 
7: [string1, string2] 
?--  
EnSelection(Parent1, P arent2, Children1, Children2) : 
Enhanced Selection operation two select two appropriate strings 
8: P(n)?-o  
string1 
9: P(n+1)?-string2 
10: 
n?- 
n+2 
11: end while 
12: Pm?-LSMut(P): Log-scale mutation after crossover 
13: 
P?- 
N best chromosomes of [P,Pm] 
14: GP(i)?-best of [P] 
15: if Fitness(GP(i)) < Fitness(GP(i-1)) then 
16: GP(i)?-GP(i 
-) 
1) 
17: end if 
18: end for 
19: return GP 
5 Results and Discussion 
In this section, we provide e?ectiveness of the proposed GA for various bench-mark 
functions [9,33]. Hereby comparing few variants of the GA, where they 
are distinguished based on their di?erent selection and crossover strategies, an 
outline is given in Table 1. All the above variants are discussed in [32,33] and 
tested on the benchmark functions which are concisely tabulated in Table 2. We 
?rstly present the results on Goldsteinprice, Levi, Beale, Himmelblau, Ackley, 
and Rastrigin benchmark functions. Note that Rastrigin and Himmelblau func-tions 
are multimodal in their nature while the Ackly function possesses a large 
hole at its center with multi modularity. On the other hand, Beale function is a 
unimodular with four sharp peaks at the corners. Similarly, Levi function has a 
non-linear search space that may show a premature convergence in due course of 
the execution of our optimization algorithm. Equally, it is worth noticing that an 
optimization algorithm may get trapped in some of local minima of the objective 
function, which our proposal overcome by having a larger diversity as shown in 
Fig. 3 for di?erent problems. Simulation results for comparative analysis of the 
ESALOGA with respect to standard GA, VGA-1, VGA-2, VGA-3, and VGA-4 
is given in Table 3 for 100 runs on aforementioned two variables problems.
Advances in Genetic Algorithm 739 
Table 1. Selection and crossover strategies in variants of GA (VGA) 
GA variants SGA VGA-1 VGA-2 VGA-3 VGA-4 
Selection RW Random RW Random Tournament 
Crossover Single-point Two-points Uniform Uniform Uniform 
Table 2. Benchmark functions for testing ESALOGA 
Functions Mathematical Description 
Himmelblau: f 1 (x 1 , x 2 ) = (x 
2 
1 
+ x 2 
-o 
11) 
2 
+ (x 1 
+ x 
2 
2 
-1 
7) 
2 
with variables limit -6 
=6 
x 1 , x 2 
=6 
6 
Rastrigin: f(x i ) = An + 
.nn 
i=1 
(x 
2 
i 
-= 
Acos(2px i )) with variables limit -5.12 
=) 
x 1 , x 2 
=) 
5.12 
Ackley: f(x 1 , x 2 ) = -20exp(-0.2 
.(
0.5(x 2 
1 
+ x 2 
2 
)) 
-) 
exp(0.5(cos(2px 1 ) + cos(2px 2 ))) + e + 20 
a = 20, b = 0.2, c = 2p, with variables limit -35 
=3 
x i 
=3 
35 
Beale: f(x 1 x 2 ) = (1.5 
-. 
x 1 
+ x 1 x 2 ) 
2 
+ (2.25 
-. 
x 1 
+ x 1 x 
2 
2 
) 
2 
+ (2.625 
-. 
x 1 
+ x 1 x 
3 
2 
) 
2 
with variables limit -4.5 
=. 
x 1 , x 2 
=. 
4.5 
Levi: f(x 1 x 2 ) = sin 
2 
(3px 1 ) + (x 1 
-p 
1) 
2 
(1 + sin 
2 
(3px 2 )) + (x 2 
-) 
1) 
2 
(1 + sin 
2 
(2px 2 )) 
with variables limit -10 
=1 
x 1 , x 2 
=1 
10 
Goldstein: f(x 1 x 2 ) = (1 + (x 1 
+ x 2 
+ 1) 
2 
(19 
-9 
14x 1 
+ 3x 
2 
1 
-x 
14x 2 
+ 6x 1 x 2 
+ 3x 
2 
2 
))(30 + (2x 1 
-x 
3x 2 ) 
2 
(18 
-8 
32x 1 
+ 12x 
2 
1 
+ 48x 2 
-8 
36x 1 x 2 
+ 27x 
2 
2 
)) with variables limits -2 
=2 
x 1 , x 2 
=2 
2 
Styblinski -Tang: f(x) = 
1 
2 
.)n 
i=1 
(x 
4 
i 
-= 
16x 
2 
i 
+ 5x i ), with variables limit -5 
=5 
x i 
=5 
5 
Michalewicz: f(x) = 
-) 
.)
n 
i=1 
sin(x i ) sin 
2m 
.m
ix 
2 
i 
px 
.x
with variables limit 0 
=x 
x i 
=x 
px 
Schaffer No2.: f(x) = 0.5 + 
..n-1 
i=1 
sin 
2 
(x 
2 
i 
-x 
2 
i+1 
)-0.5 
(1+0.001(x 2 
i 
+x 2 
i+1 
)) 2 
with variables limit -100 
=) 
x i 
=) 
100 
Deceptive: f(x) = 
-) 
.)
1 
n 
.)
n 
i=1 
g i (x i ) 
.=
ß= 
with variables limit 0 
== 
x i 
== 
1, and ßn = 2, 
Keane Bump: f(x) = 
-| 
.|n 
i=1 
cos 
4 
(x i )-2 
.2n 
i=1 
cos 
2 
(x i ) 
.o .o
n 
i=1 
ix 2 
i 
.x
0.5 
| 
subject to: g 1 (x) = 0.75 
-) 
.)
n 
i=1 
x i 
< 0, g 2 (x) = 
.)
n 
i=1 
x i 
-= 
7.5n < 0 
Results are compared on Six attributes, i.e., the best achieved by algorithms, 
mean of the all solutions in 100 runs, standard deviation (Std) of the solutions 
achieved in 100 runs, reliability of the algorithms stand for the solution achieved 
by all lower than the mean of proposed GA, the worst achieved and at the last 
average time taken by all algorithms for 1000 evolution epochs. This follows from 
the average measurement techniques, giving a consistent and accurate determi-nation 
of the approximate global optimal point as the e?ective of our proposed 
algorithm. Interestingly, while the SGA and other variants get trapped in one of 
their local optima, our proposed algorithm successfully terminates by locating 
the global optimum for various benchmark functions. 
The corresponding comparative results of the diversity preservation is 
depicted in Fig. 3. In this ?gure, one can observe the spread of search for Him-melblau, 
Beale, Ackley and Levi functions. As we can see that SGA trapped at 
one point where ESALOGA examines di?erent points for the global solution. 
Approximately, similar e?ect can be seen for other functions. We address the 
issue of premature convergence of the algorithm where the diversity preserva-tion, 
where most of the GA variants behave similarly. Thus, we have proposed 
an enhance selection scheme to overcome this condition of premature conver-gence. 
We can equally maintain the diversity preservation adequately as shown
740 N. Gupta et al. 
Fig. 3. Comparative result for the diversity preservation for the same number of gen-erations 
(Left: Standard GA, Right: Proposed GA).
Advances in Genetic Algorithm 741 
Table 3. Comparative simulation results of the proposed GA and other GA variants 
in 100 runs 
PGA SGA VGA-1 VGA-2 VGA-3 VGA-4 
Goldsteinprice 
Best 3.0010 3.0010 3.0010 3.0010 3.0010 3.0010 
Mean 3.0806 11.4169 11.5692 6.1199 5.7327 12.7232 
Std 0.0734 17.1370 17.9966 14.0022 8.2900 17.3742 
Reliability 60% 56% 56% 60% 60% 50% 
Worst 3.313 88.868 84.080 89.541 32.634 76.699 
time 1.2953 2.9910 3.2558 2.9816 2.9116 3.6707 
Levi 
Best 7.8091e-04 5.5598e-05 5.5598e-05 5.5598e-05 5.5598e-05 5.5598e-05 
Mean 0.0268 0.0555 0.0712 0.1220 0.1019 0.1432 
Std 0.0270 0.1362 0.1733 0.2254 0.1790 0.3829 
Reliability 60% 60% 70% 64% 54% 48% 
Worst 0.110 0.725 0.725 0.725 0.725 2.600 
time 1.3567 3.1313 3.7612 3.0177 3.3239 3.9555 
Beale 
Best 3.1186e-05 8.0472e-05 8.0472e-05 3.1186e-05 3.1186e-05 2.1385e-04 
Mean 0.0024 0.2249 0.2651 0.2645 0.1736 0.2841 
Std 0.0027 0.3054 0.4824 0.3083 0.2815 0.3226 
Reliability 60% 14% 14% 14% 24% 60% 
Worst 0.012 0.926 2.689 0.816 0.926 0.974 
time 1.3624 2.9880 3.2909 2.9839 2.9306 3.6881 
Himmelblau 
Best 3.9863e-05 4.9682e-04 4.9682e-04 4.9682e-04 4.9682e-04 4.9682e-04 
Mean 0.0633 0.2900 0.1968 0.2373 0.1486 0.6404 
Std 0.2850 0.7672 0.5683 0.6091 0.4067 1.2703 
Reliability 96% 76% 84% 76% 86% 64% 
Worst 1.444 4.705 2.755 3.717 1.643 6.625 
time 50.8745 3.6972 4.1740 3.4376 3.4056 4.5079 
Ackley 
Best 0.0182 0.1982 0.1982 0.1982 0.1982 0.1982 
Mean 0.0372 0.6684 0.4341 0.6788 0.3161 0.9028 
Std 0.0097 1.1057 0.8200 1.0503 0.5920 1.2887 
Reliability 42 % 0% 0% 0% 0% 0% 
Worst 0.061 3.639 3.639 3.639 3.639 3.639 
time 1.4144 3.0694 3.3770 3.3646 3.2705 3.6703 
Rastrigin 
Best 0.0099 0.0104 0.0104 0.0104 0.0104 0.0104 
Mean 0.2907 1.0351 1.2141 0.9595 1.5399 2.2707 
Std 0.3054 1.0248 1.5045 1.0608 1.3773 2.1881 
Reliability 66% 32% 32% 34% 20% 16% 
Worst 1.0160 4.1020 7.9655 4.9817 5.0958 9.1854 
time 1.3768 2.8542 3.1760 2.8413 3.0166 3.9116
742 N. Gupta et al. 
in Fig. 3, which makes our algorithm relatively e?cient. Further, we see from 
Fig. 3 that our proposal reveals various local and global optimal points of the 
aforementioned benchmark functions and o?ers a great diversity in searching 
process instead of getting the same point under di?erent evolutions. 
This yields an appropriate optimization with high diversity preservation in a 
given mating pool. We ?nd an improved reliability (in percentage) as shown in 
Table 3 in contrast to the standard GA and its other variants that get trapped in 
an intermediate suboptimal state at most of the time. Hereby, we ?nd that the 
average performance of ESALOGA is comparable with the standard deviation. 
Also the time taken by ESALOGA is reasonable. Moreover, from the results on 
Himmelblau function, one can observe that ESALOGA tries the best to ?nd 
a better solution, but on the cost of its runtime. It reveals that ESALOGA 
guarantees a better solution every time. As a mater of the fact, our algorithm 
yields an intelligent mechanism to come out from a suboptimal trap and local 
optima of a class of benchmark functions. By tuning the selective pressure to its 
higher value, we can generate a desired diversity in the population and scan entire 
search space while searching the global optimum. This provides an appropriate 
trade-o? between the selective pressure and diversity pressure. 
5.1 Comparison of Proposed GA with Other Optimizers 
In this section, we extend our algorithm to higher dimensions and provide com-parison 
of ESALOGA with other optimizers involving certain complex functions, 
i.e., Rastrigin, Ackley, Scha?er no2 [34], Michalewicz [35], Styblinski-Tang [36], 
Deceptive [37,38] and constrained Keane’s bump [39], as shown in Table 2. These 
functions have ability to extend in arbitrary dimensions with nonlinear analytical 
investigations. These functions are related to the real-world problems, for exam-ple, 
the Ackley function is considered as the free energy hypersurface of proteins. 
Most of the above test functions add the di?culty of being less symmetric and 
possess higher harmonics, which makes the functions di?cult to solve and keep 
the environment uncertain. Namely, the Scha?er function has concentric bar-riers, 
whereby it capable to discriminate di?erent optimizers. Hereby, we have 
tested our algorithm on highly nonlinear, multi-modal functions with a large 
number of local extrema. As mentioned above, one of them is the Michalewicz 
function, which is observed as a strange mathematical function having n! num-ber 
of local optima in n dimensions. Our optimization algorithm has given an 
improved solution, as shown below in Table 4. Description of the Styblinski-Tang 
Function [36] is considered further. 
Another complex function is the deceptive function, which ?nds its impor-tance 
in discriminating di?erent optimizers. As in [37,38], it can be seen in the 
existing literature about its computational di?culties. Here, we have shown the 
results on the above complex functions, which qualify our algorithm as an apt 
global optimizer. In the sequel, we focus on the constrained complex function 
in multi-dimension. Namely, the Keane’s bump function is considered as the 
test function. It is highly nonlinear and di?cult to solve by the existing opti-mizers 
because its solution exists at a nonlinear boundary. Performance of the
Advances in Genetic Algorithm 743 
ESALOGA has been analyzed in comparison to VGA-4, probabilistic bee opti-mization 
(PBA) [40], invasive weed optimization (IWO) [41], and shu?e frog 
leap algorithm (SFLA) [42]. The comparative results of the above optimizers on 
ten dimensional test functions are shown in Table 4. The parameters setting of 
all the optimizers are taken as per the followings: 
1. VGA-4 parameters: 
(a) Fifty chromosomes are taken in a population 
(b) Crossover probability is ?xed at 0.8 to form the matting pool 
(c) Mutation probability is taken as 0.02 
2. PBA parameters: 
(a) Number of scout bees are 50 
(b) Recruited bees scale are de?ned as ?round(0.3*50)? 
(c) Neighborhood radius is set as 0.1*(maximum variable value ? minimum 
variable value) 
(d) Neighborhood radius damp rate is 0.9 
3. IWO parameters: 
(a) Population size is taken as 50 
(b) Minimum and maximum numbers of the seeds are 0 and 5 respectively 
(c) Variance reduction exponent is set to 2 
(d) Initial and ?nal values of the standard deviation are 0.5 and 0.01 respec-tively 
4. SFLA parameters: 
(a) Memeplex size is 25 
(b) Number of memeplexes is 2 
(c) Number of parents are de?ned as the maximum of rounded value of 
(0.3*25) or 2 
(d) Number of o?-springs is taken as 3 
(e) Maximum number of iterations is 5 
5. proposed ESALOGA parameters: 
(a) 50 chromosomes are taken in the population 
(b) Crossover probability is 0.8 to form a matting pool 
(c) Mutation probability is adaptively de?ned between 0 to 0.05 by our pro-posed 
mutation scheme 
(d) Mutation probability during enhance selection procedure is 0.02. 
In one run of the optimization, all optimizers give the solution in ?ve hun-dred 
generations. We run the proposed algorithm for all the above mentioned 
benchmark functions for ?fty times to see the performance statistics. Hereby, 
we compare the results on all the selected benchmark functions. Through the 
observation of the Table 4, we can deduce the preeminence of the ESALOGA 
over PBA, IWO, SFLA and GA for the above class of test functions. Compari-son 
made on the ?ve indices named as Best, Worst, Mean, which is achieved in 
50 runs of the optimizer, Std the standard deviation of solutions in 50 runs by 
optimizer, and Consistency which is de?ned as how many times the optimizer is 
quali?ed as an expected solution (in percentage).
744 N. Gupta et al. 
Table 4. Comparative results on ten variables for ?fty runs 
VGA-4 PBA IWO SFLA PGA 
Styblinski-Tang Function 
Best -389.2077 -261.6483 -377.5249 -377.5249 -391.6528 
Worst -374.5288 -176.9046 -320.9780 -374.5288 -376.9688 
Mean -383.0037 -218.7373 -352.0788 -354.9062 -385.3267 
Std 4.2495 22.0725 16.9144 15.4860 3.4056 
Consistency (Solution <-383) 55% 0% 0% 0% 90% 
Michalewicz Extension function 
Best -9. 5033 -3.4877 -9.3631 -9.2164 -9.6575 
Worst -8.1878 -2.2156 -7.9995 -8.1878 -8.2459 
Mean -8.9632 -2.8983 -8.8179 -8.6147 -9.0075 
Std 0.3990 0.3521 0.4090 0.4242 0.3343 
Consistency (Solution <-9) 55% 0% 40% 30% 65% 
Ackley function 
Best 0.0016 2.3168 0.0020 0 1.335e-04 
Worst 2.0225 19.7360 18.8521 1.6538 1.6538 
Mean 0.1086 11.6177 12.7090 0.4520 0.0828 
Std 0.4506 4.8105 8.5472 0.8391 0.3698 
Consistency (Solution <0.1) 95% 0% 30% 75% 99% 
Rastrigin function 
Best 3.0071 9.9496 0.9955 2.9849 1.0173 
Worst 21.2696 34.8234 16.9149 21.2696 14.2134 
Mean 10.8882 24.4759 8.7562 14.8746 6.3935 
Std 5.0115 5.7601 3.7172 8.1695 3.5663 
Consistency (Solution <10) 55% 50% 75% 35% 90% 
Scha?er function No. 2 
Best -3.9918 -1.0227 -1.1854 -3.4150 -3.7789 
Worst -2.1046 0.0065 -0.1801 -2.1046 -2.6381 
Mean -3.1594 -0.0941 -0.5848 -2.6446 -3.3691 
Std 0.4458 0.2305 0.2734 0.5166 0.2948 
Consistency (Solution <-3) 55% 0% 0% 25% 75% 
Deceptive function 
Best -0.9255 -0.4140 -0.7724 -0.8464 -0.9255 
Worst -0.7483 -0.2729 -0.7040 -0.7483 -0.7187 
Mean -0.8196 -0.3185 -0.7259 -0.7853 -0.7955 
Std 0.0394 0.0389 0.0247 0.0326 0.0399 
Consistency (Solution <0.8) 40% 0% 0% 10% 100% 
Keane Bump function 
Best -0.7257 -0.2368 -0.7492 -0.7038 -0.7405 
Worst -0.6290 -0.1238 -0.2740 -0.6014 -0.6014 
Mean -0.6818 -0.1750 -0.5778 -0.5532 -0.6856 
Std 0.0292 0.0278 0.1532 0.1073 0.0357 
Consistency (Solution <0.6) 30% 0% 25% 15% 55%
Advances in Genetic Algorithm 745 
Based on the observations as in Table 4, we can extract the following com-parative 
results: 
1. ESALOGA is highly consistent than the other optimizers. In comparison 
other optimizers, we have observed with the presented results that the ESA-
LOGA performs well for multimodal functions, which are highly complex 
functions in their nature according to the literature [37,38]. 
2. For Styblinski-Tang, Ackley, Rastrigin and Deceptive functions, we ?nd that 
no other optimizer that the VGA-4 gives acceptable results as in Table 4. 
Here, ESALOGA gives the optimal solution with a high consistency and low 
standard deviation. Table 4 shows that the consistency of ESALOGA is 90%, 
90%, 99%, and 100% for Styblinski-Tang, Rastrigin Ackley and Deceptive 
functions respectively. 
3. For Michalewicz function, the best optimization is given by ESALOGA with 
a consistent mean around -9.0075, which is the best in comparison to all other 
optimizers. 
4. Scha?er function No. 2 is another highly complex function, which we have 
solved by the ESALOGA with better results than the other above mentioned 
optimizers. 
5. On the highly complex constraint test function named as Kean-bump func-tion, 
the ESALOGA gives outstanding results over other optimizers. Note 
that only GA tries to compete with the results of the ESALOGA. 
6. Overall, statistical results of ESALOGA are far better than the other opti-mizers, 
as well. 
6 Conclusion 
In this paper, we have given an improved search technique based on biological 
evolution. This is well suited to optimize multi-variable objective functions with 
and without discontinuities. As a matter of the fact, the proposed operators 
are ?exible in ?nding the global minimum solution of a benchmark function. 
Hereby, our proposition gives an improved technique for solving optimization 
problems. Further, we have given simulation results of our proposal as a variant 
of the standard GA. As the veri?cation of the same, we have enlisted the global 
solution of various of two variables bench-mark functions. 
From the simulated results, it is found that our method precisely locates the 
optimal points of multi-modal bench-mark functions. Hereby, various drawbacks 
of the binary-coded GA including imprecision and inconsistency are taken care 
by Metropolis scheme. This provides an enhanced selection and adaptive log-scale 
mutation scheme. Subsequently, the global optimal solution is obtained 
with an acceptable value of selection pressure. In other words, our proposal is 
a meta-heuristic approach as far as the global optimization problems are con-cerned. 
Indeed, this gives an improved precision and consistency as revealed via 
the simulated results.
746 N. Gupta et al. 
7 Future Scope 
Proposed GA has a considerable scope of further improvement as discussed in 
this section. The ?rst stage of improvement belongs to the parallel population 
approach which may give a better solution. To introduce more diversity, random 
selection, tournament selection can be tested instead of roulette wheel selection 
before crossover, where we are proposing selection after crossover. This is taken 
as complimentary selection scheme for introducing more diversity after crossover. 
In next paper, we will test the above speci?ed selection strategies with proposed 
GA and compare with the di?erent variants of the available GA. The second 
improvement is at the stage of crossover where di?erent crossover techniques 
such as single-point, multi-point, uniform, mid-point techniques can be tested 
to see the superiority of proposed GA over other variants. 
Results on the limited functions showing the proposed GA superiority over 
SGA and other variants of GA. Thus, the insertion of enhanced selection scheme 
as treated as complimentary selection after crossover, the log mutation scheme in 
the structure of other GA variants may give better results over others. Moreover, 
performance of the proposed GA can be increased by utilizing binary tree mem-ory. 
Thus, we observe that the proposed GA has a wide scope of improvements 
and it may further emerge as a dominant optimization algorithm for large scale 
complex problems from sociology, engineering, Topology, Graphs, Biology, etc. 
At this juncture, we anticipate that our proposal ?nds various applications in 
real world industrial problems such as power systems, its transmission expansion 
planning, data systems and wireless technology. 
References 
1. Bill, N.M., David, M.R.: Total productive maintenance: a timely integration of 
production and maintenance. Prod. Inven. Manag. J. 33(4), 6–10 (1992) 
2. Bevilacqua, M., Braglia, M.: The analytic hierarchy process applied to maintenance 
strategy selection. Reliab. Eng. Syst. Saf. 70(1), 71–83 (2000) 
3. Doganay, K.: Applications of optimization methods in industrial maintenance 
scheduling and software testing. M¨alardalen University Press Licentiate Theses, 
School of Innovation, Design and Engineering, 180 (2014) 
4. Shen, M., Peng, M., Yuan, H.: Rough set attribute reduction based on genetic 
algorithm. In: Advances in Information Technology and Industry Applications, 
The Series Lecture Notes in Electrical Engineering, vol. 136, pp. 127–132 (2012) 
5. Sobh, T., Elleithy, K., Mahmood, A., Karim, M.: Innovative algorithms and tech-niques 
in automation, Industrial Electronics and Telecommunications (2007) 
6. Hillier, M.S., Hillier, F.S.: Conventional optimization techniques, evolutionary opti-mization. 
Int. Ser. Oper. Res. Manag. Sci. 48, 3–25 (2002) 
7. Miettinen, K., Neittaanmaki, P., Makela, M.M., Periaux. J.: Evolutionary algo-rithms 
in engineering and computer science: recent advances in genetic algorithms. 
In: Evolution Strategies, Evolutionary Programming, Genetic Programming and 
Industrial Applications, Wiley (1999) 
8. Kar: Genetic algorithm application (2016). http://business-fundas.com/2011/ 
genetic-algorithm-applications/. Accessed 27 June 2016
Advances in Genetic Algorithm 747 
9. Deb, K.: Optimization for Engineering Design: Algorithms and Examples. Prentice 
Hall of India Private limited, New Delhi (2005) 
10. Tiwari, B.N.: Geometric perspective of entropy function: embedding, spectrum 
and convexity, LAP LAMBERT Academic Publishing, ISBN-13: 978-3845431789 
(2011) 
11. Gupta, N., Tiwari, B.N., Bellucci, S.: Intrinsic geometric analysis of the network 
reliability and voltage stability. Int. J. Electr. Power Energy Syst. 44(1), 872–879 
(2010) 
12. Bellucci, S., Tiwari, B.N., Gupta, N.: Geometrical methods for power network 
analysis. Springer Briefs in Electrical and Computer Engineering (2013). ISBN: 
978-3-642-33343-9 
13. Nelson, B.L.: Optimization via simulation over discrete decision variables. In: Tuto-rials 
in Operation Research, INFORMS, pp. 193 – 207 (2010) 
14. Gupta, N., Shekhar, R., Kalra, P.K.: Computationally e?cient composite transmis-sion 
expansion planning: a Pareto optimal approach for techno-economic solution. 
Electr. Power Energy Syst. 63, 917–926 (2014) 
15. Gupta, N., Shekhar, R., Kalra, P.K.: Congestion management based roulette wheel 
simulation for optimal capacity selection: probabilistic transmission expansion 
planning. Electr. Power Energy Syst. 43, 1259–1287 (2012) 
16. Goldberg, D.E.: Genetic Algorithms in Search Optimization and Machine Learning. 
Addison-Wesley, Reading (1989b) 
17. Chung, H.S.H., Zhong, W., Zhang, J.: A novel set-based particle swarm optimiza-tion 
method for discrete optimization problem. IEEE Trans. Evol. Comput. 14(2), 
278–300 (2010) 
18. Liang, Y.C., Smith, A.E.: An ant colony optimization algorithm for the redundancy 
allocation problem (RAP). IEEE Trans. Reliab. 53(3), 417–423 (2004) 
19. Sharapov, R.R.: Genetic algorithms: basic ideas, variants and analysis, Source: 
Vision Systems: Segmentation and Pattern Recognition, ISBN 987-3-902613-05-9, 
Edited by: Goro Obinata and Ashish Dutta, pp.546, I-Tech, Vienna, Austria, June 
2007. Open Access Database www.i-techonline.com 
20. Holland, J.H.: Adaptation in natural and arti?cial systems, University of Michigan 
Press, Ann. Arbor, MI (1975) 
21. Goldberg, D.E., Lingle, R.: Alleles, loci, and the TSP. In: Proceedings of the 1st 
International Conference on Genetic Algorithms, pp. 154 – 159 (1985) 
22. Malhotra, R., Singh, N., Singh, Y.: Genetic algorithms: concepts, design for opti-mization 
of process controllers. Comput. Inf. Sci. 4(2), 39–54 (2011) 
23. Spears W.M., De Jong, K.A.: On the virtues of parameterized uniform crossover. 
In: Proceedings of the 4th International Conference on Genetic Algorithms (1994) 
24. Gupta, D., Gha?r, S.: An Overview of methods maintaining diversity in genetic 
algorithms. Int. J. Emerg. Technol. Adv. Eng. 2(5), 263–268 (2012) 
25. Ming, L., Junhua, L.: Genetic algorithm with dual species. In: International Con-ference 
on Automation and Logistics Qingdao, pp. 2572 – 2575 (2008) 
26. Cantu-Paz, E.: A survey of parallel genetic algorithms. Calc. Paralleles Reseaux 
Syst. Repartis 10(2), 141–171 (1998) 
27. Aggarwal, S., Garg, R., Goswani, P.: A review paper on di?erent encoding schemes 
used in genetic algorithms. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), 596– 
600 (2014) 
28. Baluja, S., Caruana, R.: Removing the genetic form the standard genetic algorithm. 
In: Proceedings of the 12th International Conference on Machine Learning, pp. 38 
– 46 (1995)
748 N. Gupta et al. 
29. Srinivas, M., Patnaik, M.: Adaptive probabilities of crossover and mutation in 
genetic algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994) 
30. Goldberg, D.E., Sastry, K., Kendall, G.: Genetic algorithms. In: Burke, E.K., 
Kendall, G. (eds.), Search Methodologies: Introductory Tutorials in Optimization 
and Decision Support Techniques. Springer, Science + Business Media, NY (2014) 
31. Cipra, B.A.: The Best of the 20th Century: Editors Name Top 10 Algorithms, 
SIAM News 33(4) (2016). https://www.siam.org/pdf/news/637.pdf. Accessed 27 
June 2016 
32. Man, K.F., Tang, K.S., Kwong, S.: Genetic algorithm: concepts and applications. 
IEEE Trans. Ind. Electron. 43(5), 519–534 (1996) 
33. Jamil, M., Yang, X.: A Literature survey of benchmark functions for global opti-mization 
problems. Int. J. Math. Model. Numer. Optim. 4(2), 150–194 (2013) 
34. https://www.sfu.ca/~ssurjano/scha?er2.html 
35. https://www.sfu.ca/~ssurjano/michal.html 
36. https://www.sfu.ca/~ssurjano/stybtang.html 
37. Icl?anzan, D.: Global optimization of multimodal deceptive functions. In: Blum, 
C., Ochoa, G. (eds.) Evolutionary Computation in Combinatorial Optimisation. 
EvoCOP 2014. Lecture Notes in Computer Science, vol. 8600. Springer, Berlin, 
Heidelberg (2014) 
38. Li, Y.: The deceptive degree of the objective function. In: Wright A.H., Vose M.D., 
De Jong K.A., Schmitt L.M. (eds.) Foundations of Genetic Algorithms. FOGA 
2005. Lecture Notes in Computer Science, vol. 3469. Springer, Heidelberg (2005) 
39. Mishra, S.K.: Minimization of Keane’s bump function by the repulsive particle 
swarm and the di?erential evolution methods, May 2007 (2007). SSRN:http:// 
ssrn.com/abstract=983836 
40. Karaboga, D., Akay, B.: A comparative study of arti?cial bee colony algorithm. 
Appl. Math. Comput. 214(1), 108–132 (2009) 
41. Bozorg-Haddad, O., Solgi, M., Lo´aiciga, H.A.: Invasive weed optimization. Meta-
Heuristic and Evolutionary Algorithms for Engineering Optimization, pp. 163–173. 
Wiley (2017) 
42. Eusu?, M., Lansey, K., Pasha, F.: Shu?ed frog-leaping algorithm: a memetic meta-heuristic 
for discrete optimization. Eng. Optim. 38(2), 129–154 (2006). Taylor & 
Francis
Second-Generation Web Interface 
to Correcting ASR Output 
Old?rich Kr°uza(B) 
and Vladislav Kubon? 
Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, 
Charles University, Malostransk´e n´am. 25, Prague, Czech Republic 
{kruza,vk}@ufal.mff.cuni.cz 
Abstract. This paper presents a next-generation web application that 
enables users to contribute corrections to automatically acquired tran-scription 
of long speech recordings. We describe di?erences from similar 
settings, compare our solution with others and re?ect on the develop-ment 
from the now 6 years old work we build upon in the light of the 
progress made, lessons learned and the new technologies available in the 
browser. 
Keywords: Speech recognition 
·
Transcription 
·
Community-driven 
Web standards 
1 Introduction 
In 2012 [7], we have presented a setting where a community of users contributed 
corrections to automatically transcribed talks of a single speaker. Now that the 
browser technologies evolved drastically and we could observe the usage patterns 
and discover shortcomings of the solution at hand, we have created a next gen-eration 
of the programme. We shall describe the steps taken and discuss their 
motivation and impact. 
The application we describe is a part of a larger system that deals with 
Makon’s ? recordings. It consists roughly of (1) the corpus itself, (2) an ASR 
system trained specially for it and (3) a web interface for the users. These three 
parts form a whole where the ASR gives a baseline transcription, the users 
correct it and the corrections are fed as further training data to the acoustic and 
language models. In this paper, we focus on the web interface. 
1.1 Motivation 
Our project focuses on the collection of recordings of Karel Makon? [5] *1912 
†1993, the author of numerous books, translations and comments to works of 
spiritual and religious nature, who was in?uenced by trances during recurring 
surgery without anesthesy in the age of 6, ecstasies in the youth and ?nally 
.n
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 749–762, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_56
750 O. Kr°uza and V. Kubon? 
facing and surviving certain death in a Nazi concentration camp, after which he 
experienced enlightenment. He gave talks in a narrow circle of friends and the 
recordings in our care have been taken between early 70’s and 1991, spanning 
about 1000 h in total. 
All of Makon’s ? work deals more or less directly with a single topic: entering 
the eternal life before the physical death. He draws mainly from the Christian 
symbolism, builds up on Christian mysticism and ancient tradition of India and 
China. 
Makon’s ? written works present his teachings in a systematic, comprehen-sive 
fashion, while the recordings o?er bonuses: talks tailored to the audience, 
answers to questions, personal experiences, behind-the-scenes to the books etc. 
The archive is freely accessible1 under the CC-BY license. 
2 Di?erences to Other Settings 
The spoken corpus is about 1000 h of a single speaker. Our aim is to have a 
transcription as good as possible for the purpose of searching and further, higher-level 
processing of the data. There is a pool of people interested in the talks, 
who on one hand are the force we can try to employ and on the other hand are 
the consumers of our e?ort, our target group so to speak. 
The web application should therefore combine the two purposes: 1. serve its 
user with making the content available in a manner as good as possible and 2. 
animate the user to give as much and as high-quality contribution as possible. 
To our best knowledge, there is no other project with a comparable setting. 
However, we can compare single aspects found in other applications. 
2.1 Transcription Apps 
The best widespread match to our task is that of creating an application for 
transcribing speech recordings. Let us compare the two tasks, pointing out the 
main points of di?erence. For reference, we take (1) Transcriber2 , a classical 
open-source program written in TCL, (2) oTranscribe3 , a free modern web-based 
transcription tool and (3) Transcribe4 a commercial web-based transcription 
tool. 
The numbers in the bullet list below denote the programs our statement 
applies to. For example, of the three only Transcriber allows speaker annotation, 
hence there is only the number (1) standing at the second list item. 
1 
https://lindat.m?.cuni.cz/repository/xmlui/handle/11372/LRT-1455. 
2 
trans.sourceforge.net. 
3 
otranscribe.com. 
4 
transcribe.wreally.com.
Second-Generation Web Interface to Correcting ASR Output 751 
• 
transcription applications: 
• 
our application: 
• 
are optimized for the case 
where there is no 
transcription available and it 
must be acquired from 
scratch; 
(1,2,3) 
• 
always assumes a prior 
transcription is available; 
• 
allow annotation of 
speakers; 
(1) 
• 
assumes all utterances 
come from the same speaker; 
• 
need no quality control: the 
user is free to enter whatever 
transcription she pleases and 
the ultimate measure is her 
satisfaction; 
(1,2,3) 
• 
needs the transcription to 
be accurate because it is used 
as training data for the 
acoustic model; 
• 
use alignment on the level 
of phrases, if any; 
(1)
5 
• 
uses alignment on the level 
of words; 
• 
are user-centric: the user 
transcribes whatever acoustic 
data they choose; 
(1,2,3) 
• 
is data-centric: the whole 
application with all its tools 
and persons revolves around 
the data set; 
• 
assumes the user wants to 
transcribe; 
(1,2,3) 
• 
assume the user wants to 
listen and possibly read along 
and we want to animate her 
to submit transcriptions; 
• 
has no shared data between 
users; 
(1,2)
6 
• 
must count with collisions. 
Despite of these di?erences, we can still learn a lot from transcription soft-ware. 
The ease of performing common tasks, like pausing, resuming and rewind-ing 
is crucial for the user experience and in e?ect for the amount of submissions 
that we receive. Also, the way the text is displayed synchronously to the audio 
played has a big impact and the approaches have a lot of space for variation. 
2.2 Wiki 
Where our application diverts from transcription software, it mostly resembles 
a wiki: a community platform that serves its users including the contributors 
but where the quality of the contributions is essential, while the contributor’s 
satisfaction alone is of less importance. 
One major point of di?erence to a wiki is that wiki is creative, whereas 
our task is mechanical. The user has basically no room for their own invention: 
providing a di?erent than correct transcription is seen as an error. 
5 
Transcriber explicitly aligns the text with speech, while the other two merely support 
addition of timestamps into the transcription. 
6 
Transcribe supports team co-operation.
752 O. Kr°uza and V. Kubon? 
Popular wikis have good measures for edit con?icts, which is where we could 
learn some lessons. However, so far there was no need to do that because 
1. if we always simply take the most recent version of a segment, the result stays 
consistent even if a piece from user A comes into a larger transcription of 
user B; 
2. our user base is so far limited to a small community who have no problem 
coordinating with each other. We plan to expand to broader public soon 
though. 
With regard to the transription as presented to the user, a submitted seg-ment 
of transcription always overwrites the present version but we keep all the 
submissions in a database, so undo operations, clustering submissions by their 
author etc. are possible but we had little need for this so far. 
2.3 Corpora 
Our project is not the ?rst involving community-driven care of a corpus. We can 
mention the Manually annotated sub-corpus [6], where annotations of various 
kinds are gathered from volunteers, or the Wikicorpus [10], a corpus of Wikipedia 
articles with some linguistic annotation. Our project may reach profound sim-ilarities 
with these in the future, when we no longer focus on the transcription 
itself but rather on annotation. 
There is also CzEng [3], the Czech-English Parallel Corpus, where a large 
part of the translation is provided by volunteers. The similarity in setting is 
considerable as both projects involve a machine-produced erroneous derivative 
of the original material (in our case audio transcriptions, in the case of CzEng 
Czech translations of English texts), and a community of volunteers correct 
these. But the speci?cs of the projects bring di?erent challenges and dictate 
di?erent approaches. 
Marge (2009) [8] investigates using The Mechanical Turk to obtain audio 
transcriptions. Mihalcea (2004) [9] o?ers a web interface for word-sense disam-biguation 
and focuses mostly on annotator con?ict resolution. 
3 Description of the Web Application 
3.1 Usage 
We have no special assumption of the user beyond basic computer usage skills 
and understanding the audio. We assume no prior training. There is a manual 
for clearing common points of confusion. The main message in it is that any-thing 
that is to be transcribed, should be transcribed with respect to phonetic 
precision, even if it results in nonsensical character strings. 
Anything except words spoken by the one speaker of interest is to be left 
untranscribed, including noise or speech by other persons7 . Incomprehensible 
7 
In our data, other speakers represent a negligible fraction but we may later add 
support for speaker annotation.
Second-Generation Web Interface to Correcting ASR Output 753 
words are to be left uncorrected (the ASR output kept) if the phones are unclear. 
If the phones uttered are clear but it is not clear what word was meant, the word 
may be transcribed phonetically. 
3.2 Implementation 
The application consists of several views: 
1. the start page where all recordings are listed and each points to a detail view, 
2. the detail view, where a recording can be played back, its transcription is 
displayed and can be corrected by the user, 
3. the search page, where hits to a search query are listed and point to corre-sponding 
positions in the recordings, 
4. static pages with general information, contact etc. 
We shall only discuss the detail view as the others are not relevant to this 
paper. Figure 1 shows the interface during playback. Figure 2 shows the interface 
while a segment is being edited. The interface in the ?gures is conveniently shown 
in English, although in reality it is in Czech. 
Legend to Fig. 1: 
1. Header with 
– app name linking to start page, 
– about link, 
– search ?eld and 
– username input ?eld; 
2. Identi?er of the recording; 
3. Automatically transcribed segments in grey; 
4. Manually transcribed segments in black; 
5. Currently played-back word highlighted by yellow background; 
6. Marked word highlighted in regent st. blue; 
7. Marked word info: 
– occurrence: the word with contextual capitalization and punctuation as it 
appeared in the text (currently being edited as the selected initial letter 
reveals), 
– form: normalized word form as it appears in the word list, 
– pronunciation: Czech phonetic transcription of the word, 
– position: time of the beginning of the word in seconds from the start of 
the recording; 
8. Tools for storing: 
– direct links to the audio ?les, 
– selecting the whole transcription for easy pasting, 
– storing the decoded recording in the browser’s IndexedDB; 
9. Graphical equalizer for compensating narrow-band noise;
754 O. Kr°uza and V. Kubon? 
Fig. 1. Web interface during playback 
10. Audio playback controls: 
– play/pause button, 
– current playback position, 
– playback scrollbar, 
– total recording length; 
11. Current position re?ected in URL fragment. 
Legend to Fig. 2: 
1. Selecting a text range with the mouse de?nes the segment the user is about 
to transcribe; 
2. The edit tool with 
– text area pre?lled with the current transcription, 
– playback button that plays the corresponding segment, 
– save button and 
– download-segment button, which initiates a ?le-save action for the audio 
segment corresponding the the selected text. The synthesis of the down-loaded 
?le takes place in the browser. 
The commonest tasks have keyboard shortcuts: ctrl+space for play/pause 
and ctrl+enter for submitting a correction.
Second-Generation Web Interface to Correcting ASR Output 755 
Fig. 2. Interface in the state of editing a segment 
3.3 Displaying the Transcription 
Many transcription programs show the transcription as a vertical list of utter-ances, 
see Fig. 3 for an example of Transcriber. We attribute this to the fact 
that the atomic elements of the transcription are the user-entered utterances 
and their boundaries are reliable. In our case, the atomic elements are words. 
There are sentences, sure, but the segmentation to sentences by the ASR is 
very unreliable, so we want it to be natural to transcribe a segment overlapping 
sentence boundaries. 
This is one of the reasons why we display the transcription basically as a 
single wrapped line. 
Performance Challenge. The transcription display was designed to have these 
features: 
1. Currently played-back word should be highlighted; 
2. Manually transcribed segments should be clearly distinct from automatically 
transcribed ones;
756 O. Kr°uza and V. Kubon? 
3. Selecting one or more words with the mouse should trigger transcription mode 
for the selected text; upon a successful save, this should be merged into the 
display; 
4. Clicking a word should bring up its context info (we call this the marked word 
as the term selected word is already taken); 
5. The whole transcription should be shown at once for easy searching; 
6. The page should be responsive. 
Fig. 3. A screenshot of transcriber 
These requirements are harder to combine than it may seem. Notably respon-siveness 
is hard to combine with all of the other ones. Why is that so? 
Points 1 through 4 call for every word to be wrapped in its own element. 
Point 5 and the median count of words in a transcript of about 6000 yield 6000 
<span> elements just to show the text. 
Although this may not seem like a big deal, it does a?ect the responsiveness 
and memory footprint of the page.
Second-Generation Web Interface to Correcting ASR Output 757 
In the original version, we solved this by sacri?cing point 5: only 3 lines of 
text are shown with the current word kept on the middle line as shown on Fig. 4.8 
Thanks to the development in the web standards and their support from popular 
browsers, a solution is possible. 
Fig. 4. Original web interface from 2012 
Solution. We can use the fortunate fact that manually transcribed words and 
automatically transcribed ones tend to form larger chunks. The average number 
of words per submitted segment is 7.9. Furthermore, the absolute majority of 
such segments are adjacent to other manually transcribed chunks.
9 
Hence, wrap-ping 
each chunk of consecutive manually or automatically transcribed words in 
an HTML element is no problem, which solves point 2. 
Point 3 can be implemented using document.selection and the Range 
objects, which let us ?nd out the innermost HTML element and text o?set 
of the start and end of the textual selection. Since we know the length of each 
word, this allows us to map the selection to the corresponding words in the 
transcription. 
8 
The current word is on the top line on the screenshot because it is at the beginning 
of the recording. 
9 
The median number of chunks is 1 (most recordings have no manually corrected 
segments), maximum is 1109. Median only counting touched recordings is 8.
758 O. Kr°uza and V. Kubon? 
Points 1 and 4 can be implemented in two ways: We could either wrap the 
current and marked word in a dedicated element or we could draw a highlighting 
rectangle beneath the word. 
Wrapping the word would de?netely be more robust and less error-prone 
but the constant changes in the DOM during playback with possible frequent 
re?ows speak against it. Finding the exact position of each word and draw-ing 
a rectangle precisely beneath it (beneath on the z-axis; over it in the x-y 
sense), avoiding positioning issues and keeping the rectangle position synced 
even after scrolling/window resizing is de?nitely a challenge but we chose this 
way nonetheless. The performance gain for the majority of the usage time out-weighs 
the possible errors in the corner cases, more so since the eventual errors 
are not critical and mostly remedied by further playback. 
The e?ciency of repositioning a rectangle is supported by the fact that we can 
calculate the coordinates of all rendered words once and only recalculate them in 
two cases: (1) In the rare event of screen resize and (2) when a corrected segment 
is merged into the transcription, in which case we only need to recalculate for 
the words further in the document.
10 
Manual/Automatic Distinction. As shown on Fig. 1, we draw automatic 
transcription in grey and manual one in black. Why did we choose this instead 
of normal/boldface? Firstly, the normal font is optimal for reading. Boldface 
is meant to highlight spots in text. It becomes bulky when applied on long 
continuous passages. The automatic transcription contains many errors, so there 
is no sense in optimizing it for best reading experience. 
There is also another practical reason. When the two font variants only di?er 
in color, and a segment of automatic transcription is left intact and submitted 
as correct transcription, its merge-down into the displayed text causes no re?ow, 
which saves us computations and raises responsiveness. It may seem like a rare 
use case but we believe that identifying correctly recognized words is a legitimate 
way of contribution, so why not optimize for it? 
Still, the underlying HTML tags are <span> and <b> because that way the 
distinction persists when copy-pasting the text from the web page to a rich text 
editor. 
3.4 Ergonomy 
It is clear that the ease of use is crucial in our case where the user is supposed to 
perform a requiring, tedious task with repeated steps, especially since it is our 
interest more than hers that she performs them. We compared our setting with 
that of transcription apps in Sect. 2.1, pointing out lessons to learn. Let us now 
look at some speci?c points and their actual (lack of) implementation. 
10 
We could even stop the recalculation as soon as we ?nd that the new horizontal coor-dinate 
of a word is left untouched, and add the di?erence in the vertical coordinate 
to all subsequent words, i.e. when a line stays the same, so do all below it.
Second-Generation Web Interface to Correcting ASR Output 759 
Keyboard Shortcuts. One of the most profound measures in ergonomy are 
keyboard shortcuts. The most common task is pausing and resuming playback. 
Both oTranscribe and Transcribe use the esc key for that, and Transcriber uses 
the tab key. We chose ctrl+space combination. We argue that esc is not the 
best of options for desktops because the distance the ?ngers have to travel from 
the alphanumeric keys causes a noticeable delay. This can lead to missing a pause 
between words. The tab key as chosen by Transcriber is a splendid choice from 
the ergonomy point of view and there is no reason not to use it in a dedicated 
user interface. However, in the browser, where the tab key has as native use, re-binding 
it could lead to confusion and irritation. The space bar is probably the 
easiest-to-?nd key in all situations and dedicating ctrl to all application-speci?c 
commands as opposed to single keys lends a sense of consistency, we believe. 
This is mere personal experience though, as we had no resources so far to 
perform serious research to support these statements. 
The only other keyboard shortcut we support is ctrl+enter for submit-ting 
the correction. We chose this to stay consistent using the ctrl key and 
because this shortcut is familiar to users of many instant messengers, like the 
Facebook chat or the once popular o?cial ICQ client. Also, requiring a key 
combination prevents accidental submission, which is desirable as we only want 
double-checked, guaranteed-precise ones. In comparison, Transcriber uses the 
bare enter key to separate utterances. oTranscribe and Transcribe allow free 
formatting with no explicit alignment, so using the enter key to split utterances 
by lines is the user’s choice. 
Missing Features. One of the features that Transcribe, the only commercial 
tool in our reference list, o?ers is setting up keyboard shortcuts for common 
words. We have not implemented this because ideally, common words should 
be covered by speech recognition. However, it could be sensible to implement it 
anyway. The reason is that a word can be very rare globally and thus poorly 
recognized by ASR but very common in a speci?c passage. This particularly 
regards named entities. 
Another point in our ergonomy to-do list is lifting the need to select a segment 
prior to correcting it. If the transcription was simply editable, it could increase 
the ease of use rapidly. We would have to automate the selection of segment to 
send for forced alignment but we could probably do a better job than the user 
in the end. 
3.5 Mechanics of Submitting a Corrected Segment 
As stated above, when the user selects at least one character with the mouse, 
the application enters the state of correcting the selected transcription padded 
to whole words. In this mode, the transcription to correct is shown in a text area 
and the global playback controls are replaced by those that only allow playback 
of audio corresponding to the selected transcription. 
Once the user believes that the content of the text area corresponds precisely 
to the words uttered, she hits the save button or the ctrl+enter keyboard
760 O. Kr°uza and V. Kubon? 
shortcut. This starts an asynchronous HTTP request to the back-end, where 
parametrized (MFCC) versions of the recordings are stored, along with the new 
transcription and the time positions of the beginning and end of the segment. 
The server then cuts o? the corresponding segment from the parametrized 
recording, runs forced alignment on it with the provided transcription with a 
threshold to reject bad matches. If the forced alignment fails, an error response 
is sent back and the transcription is not merged into the original. In the case of 
a success, the correction is merged on one hand on the server side and pushed 
to a CDN, on the other hand it is merged into the transcription word array in 
the JavaScript application. This redundancy warrants that we do not have to 
reload the whole transcription every time a segment is corrected. 
React ensures the updating of the chunks, and the coordinates of the words 
further in the document are recalculated for word-highlighting purposes. 
Apart from this, the version of the transcription to the recording is updated. 
This is because the transcription ?les have a long cache time because normally, 
they do not change at all. At the page load, the versions of all transcriptions are 
loaded and used as cache busters. This enables us to use an external CDN and 
cache e?ectively. 
3.6 Implementation Details 
Audio Engine. The adoption of Web Audio API [2] allowed for big improve-ments 
in comparison with the original implementation. There are four major 
di?erences between using the HTML <audio> tag and the Web Audio API. 
– It is now possible to precisely replay the selected audio span. 
– We could implement a graphical equalizer. Some recordings su?er from loud 
noise in the low frequency spectrum. A systematic approach to acoustic nor-malising 
of the material is a point of future work. Until that, the equalizer is 
a huge relief for the users. 
– Thanks to the OfflineAudioContext, it is possible to store the recording 
in the browser’s storage and avoid downloading or decoding it again after 
reload. We use IndexedDB as the storage method because localStorage has 
too low quota of about 10 MB and the FileSystem API is not yet widely 
enough supported. 
– We have also implemented saving the audio corresponding to the selected 
text segment as a sound ?le. 
App State Management. We use React as the view library and Redux [1] for 
state management. The good thing about Redux is that it makes it easy to keep 
minimal state as the single source of truth and everything that can be computed 
is computed, while avoiding needless calculations. This is of course nothing new 
– basically it is what we know from database design as the normal representation 
[4]. It is the ?rst time this approach reached the web front-end in such degree of 
popularity though.
Second-Generation Web Interface to Correcting ASR Output 761 
Also in our case, this approach makes the program more predictable, less 
error-prone and, as the modern programming jargon lovingly expresses, easier 
to reason about. But some of our features make this a bit complicated. 
Among the states the app can enter is simple playback, inspecting a word 
and transcribing a segment. The only relevant things we actually keep in the 
Redux store are: 
1. The array of transcription words, each of which bears the ?ag whether it is 
automatically or manually transcribed. This de?nes the manual - automatic 
chunks of words that in turn de?ne the HTML elements wrapping them. 
2. The beginning and end of the selection in terms of chunk number and char-acter 
o?set in the chunk, which is basically what we get from the DOM upon 
a mouseup event. 
Whether a word is marked or a segment is being edited is determined solely 
by the boundaries of the selected words. If there is a selection and the beginning 
and end are identical, it means a word was simply clicked and its detail is shown 
(it is marked). If the boundaries span at least one character, then all words that 
intersect this span are selected for correction. 
Simple as it sounds, a slight problem arises when a correction is accepted and 
the corrected subtitles are merged into the view. In the time after the correction is 
accepted and re?ected in the redux state, but before the new chunks are rendered 
in the document, selection changes cannot be reliably mapped to logical chunks. 
Simple null defaults solve this problem. 
4 Future Work 
We plan to focus on optimizing the app for wider audience. Experience con?rms 
that Makon’s ? talks are of interest to some people, and our aim is to remove 
as many obstacles as possible from potentially interested people reaching the 
material. The bene?t from technical point of view would be clear: A web app 
for listening to recordings and correcting their transcription is nice but one that 
is really easy to use and inviting to people to submit corrections is nicer. 
One of the aspects we want to explore is enabling people to naturally share 
catching segments of talks on social networks. 
Another point of near-future endeavor is higher-level work with the contents. 
By this we mean that we would like to use both automatic processing methods 
and the users to do semantic analysis of the talks: What topic is covered where? 
What topics are covered at all? Which talks relate to which written works?, and 
similar questions. 
We shall also deploy the technology on a di?erent data set once we ?nd a 
good ?t.
762 O. Kr°uza and V. Kubon? 
5 Conclusion 
With our web application, a user can listen to recorded speech, see its transcrip-tion 
with the currently played word highlighted, commit corrections to the tran-scription, 
and inspect a word. The corrections are checked by a forced-alignment 
mechanism on the server side. Our solution overcomes performance challenges 
and is a serious improvement from the original version. All our codebase is open-source, 
accessible on Github11 and we are actively looking for similar datasets 
with communities to employ the application on. 
Acknowledgments. The research was supported by SVV project number 260 453. 
This work has been using language resources stored and distributed by the LIN-
DAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech 
Republic (project LM2015071). 
References 
1. Abramov, D.: Redux. React Community. c (2015) 
2. Adenot, P., Wilson, C., Rogers, C.: Web audio API. W3C, October 10 (2013) 
3. Bojar, O., Jan´i?cek, M., Ce 
? 
?ska, P., Bena, ? P., et al.: Czeng 0.7: parallel corpus with 
community-supplied translations. LREC 2008 (2008) 
4. Codd, E.F.: A relational model of data for large shared data banks. Commun. 
ACM 13(6), 377–387 (1970) 
5. H´ajek, J.: Cesk 
? 
y´ mystik karel makon. ? Dingir 2007(4), 142–143 (2007) 
6. Ide, N., Fellbaum, C., Baker, C., Passonneau, R.: The manually annotated sub-corpus: 
a community resource for and by the people. In: Proceedings of the ACL 
2010 Conference Short Papers, pp. 68–73. Association for Computational Linguis-tics 
(2010) 
7. Kr°uza, O., Peterek, N.: Making community and ASR join forces in web environ-ment. 
In: International Conference on Text, Speech and Dialogue, pp. 415–421. 
Springer (2012) 
8. Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon mechanical Turk 
for transcription of spoken language. In: 2010 IEEE International Conference on 
Acoustics, Speech and Signal Processing, pp. 5270–5273, March 2010 
9. Mihalcea, R., Chklovski, T.: Building sense tagged corpora with volunteer con-tributions 
over the web. Recent Advances in Natural Language Processing III: 
Selected Papers from RANLP 2003 260, p. 357 (2004) 
10. Reese, S., Boleda, G., Cuadros, M., Rigau, G.: Wikicorpus: a word-sense disam-biguated 
multilingual wikipedia corpus (2010) 
11 
https://github.com/sixtease/MakonReact.
A Collaborative Multi-agent System for Oil Palm 
Pests and Diseases Global Situation Awareness 
Salama A. Mostafa1(?) , Ahmed Abdulbasit Hazeem2 , Shihab Hamad Khaleefahand3 , 
Aida Mustapha1 , and Rozanawati Darman1 
1 
Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia 
{salama,aidam,zana}@uthm.edu.my 
2 
Anbar General Director of Education, Anbar 31001, Iraq 
ahmed.a.hazeem@gmail.com 
3 
Al Maarif University College, Anbar 31001, Iraq 
shi90hab@gmail.com 
Abstract. Many researchers have been studying biological and managerial chal- 
lenges of oil palm trees plantation and production. Oil Palm Pests and Diseases 
(OPPD), such as Oryctes rhinoceros beetles and Ganoderma are most prominent 
among the natural factors that deter the growth of oil palm trees and yields. Some 
of these OPPD have the properties of fast expansion and dynamic distribution 
making the monitoring of the OPPD a complex problem. Consequently, this paper 
proposes a risk assessment framework for Oil Palm Pests and Diseases Global 
Situation Awareness (OPPD-GSA). The OPPD-GSA framework operates by a 
teamwork of humans and software agents in a Collaborative Multi-agent System 
(CMAS). The overall system is implemented and experimentally tested in moni- 
toring and controlling a sample OPPD observation data of Oryctes rhinoceros 
beetles and Ganoderma within ?ve areas in Malaysia. The test results con?rm 
that the OPPD-GSA application is able to process the OPPD monitoring tasks in 
real-time and handle Geo-located visualization data. 
Keywords: Oil palm pests and diseases · Risk assessment · Multi-agent system 
Global situation awareness 
1 Introduction 
Palm oil production is vital for the economy of Malaysia and its neighboring countries 
[1, 2]. Malaysia is the world’s second-largest producer of the commodity after Indonesia. 
Oil palm trees vicious enemies are a number of pests and diseases [3, 4]. These pests 
and diseases have the ability to spread and incur serious damages to the trees including 
the plant physiology, tissue, and metabolism. Ultimately, they damage the crop and 
curbing its ability to optimize oil production [3]. Consequently, there is an urgent and 
real need for an integrated solution to Oil Palm Pests and Diseases (OPPD) detection 
and surveillance. The solution supports a synergized e?ort between regional agencies 
and leverages the expertise and resources in eliminating the OPPD. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 763–775, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_57
Monitoring the OPPD is a distributed and dynamic problem that requires advanced 
technologies. A multi-agent system provides a variety of agents’ capabilities that facil- 
itate ?exibility in solving dynamic and distributed problems [5, 6]. The agents are 
equipped with communication, coordination, cooperation and/or negotiation capabili- 
ties. The goal of an individual agent dictates the resolution of local and dynamic prob- 
lems, while the goal of a multi-agent system dictates the resolution of distributed prob- 
lems [7, 8]. However, the OPPD monitoring aggregates uncertainty and approximation 
of multiple events [9]. It entails that the agents understand the context of the perceived 
aggregated knowledge of events, which is a more challenging task. 
We hypothesize that improving software agents’ capabilities of situation awareness 
to handle multiple events provide a suitable foundation for a distributed risk assessment 
measures for monitoring the OPPD. The awareness of multi-agent systems for the 
prevailing conditions of the OPPD would greatly improve their mental state to manifest 
accurate monitoring ability [10, 11]. Subsequently, this work proposes an Oil Palm Pests 
and Diseases Global Situation Awareness (OPPD-GSA) framework. The OPPD-GSA 
framework deploys a teamwork of humans and software agents in a Collaborative Multi-agent 
System (CMAS). The framework is tested and validated using real OPPD data to 
provide online and global situation awareness of the OPPD for speci?c areas in 
Malaysia. 
This section presents an introduction to the work that includes the research problem, 
methods, objectives, and outcomes. The next section presents the literature review in 
three parts, which are the oil palm pests and disease, their risk assessment methods, and 
situation awareness of a CMAS. Section 3 presents the OPPD-GSA framework that 
includes the formulation of situation awareness and the CMAS. Section 4 presents the 
implementation of the overall OPPD-GSA application, testing, results, and discussion. 
Finally, Sect. 5 concludes the paper and views the future work. 
2 The Literature Review 
The research background of this work covers three parts. The ?rst part presents the major 
types of oil palm pests and diseases. The second part presents the oil palm pest and 
disease assessment methods. Finally, the third part presents examples of situation 
awareness research and applications in multi-agent systems. 
2.1 The Oil Palm Pests and Diseases 
There are many biological problems of a soil-borne organism a?ecting the oil palm trees. 
The OPPD have become the major threats to oil palm plantations of large planted 
and/or replanted areas over a long period of time [12]. The most common OPPD in 
Malaysia among other related species are Oryctes rhinoceros beetles, and basidiomycete 
fungus Ganoderma lucidum. These two OPPD cause great damage to oil palm planta- 
tions and yields. 
Oryctes rhinoceros beetle becomes a serious problem due to the instituted policy of 
no-burning in the 1990s in Malaysia [4]. The Ganoderma problem, on the other hand, 
764 S. A. Mostafa et al.
is likely to become severe over the next few years as the fungus increases its geographical 
range and virulence over time [13]. Figure 1 shows Ganoderma (fungus fruiting bodies) 
symptoms on oil palm trees in Banting, Selangor, Malaysia. 
Fig. 1. Ganoderma symptoms on oil palm trees [4]. 
Early detection of OPPD is an important prerequisite to its eventual control, and 
perhaps eradication [1]. There is a lot of ongoing research on re?ning diagnostic methods 
that eventually help in identifying, monitoring and managing these serious pathogens 
[13, 14]. Di?erent strategies have been investigated as primary options in oil palm plan- 
tations across Southeast Asia from Malaysia to Papua New Guinea [3]. The strategies 
such as integrated pest management or biocontrol can be applied to the identi?ed areas 
to intervene and control these harmful organisms. 
2.2 The Oil Palm Pests and Diseases Risk Assessment 
According to a recent study made by [15], there is a need for implementing new methods 
that are able to detect and monitor OPPD and estimate their risk. Such methods have a 
signi?cant impact on oil palm yield and industry [2]. Monitoring risks for pests and 
diseases and reducing their attacks could greatly increase and maybe even double the 
production of the oil palm crop [4]. Figure 2 shows the di?erence between the potential 
Fig. 2. The development of oil palm yield over time [17]. 
A Collaborative Multi-agent System for OPPD 765
and actual yields of the oil palm trees in which the actual yield is in?uenced by yield-reducing 
factors such as pests, and diseases [16]. 
Liaghat and Balasundram [16] present some remote sensing and GIS techniques that 
help in improving oil palm crop management. The techniques are applied to identify 
pest-infested and diseased plants in order to monitor the diseases and insect pests of 
crops. The disease infection and insect infestation damages can be measured to provide 
an estimated view of the crops and control their risk. 
Shafri and Hamdan [9] point out the need for an online detection and surveillance 
system of OPPD in Malaysia. They propose airborne hyperspectral imagery approach 
to detect and map the a?ected oil palm trees. The system uses vegetation indices, 
Lagrangian interpolation and red edge techniques in the pests and diseases detection 
process. The system is tested in real case scenario and the location of the study area is 
in Selangor. The system test results recorded 73% to 84% detection accuracy. Figure 3 
shows a sample of the healthy and diseased oil palm trees that is detected by the airborne 
hyperspectral sensor. 
Fig. 3. A sample of the healthy and diseased oil palm [9]. 
Idris et al. [18] apply geostatistical techniques to quantify some of the oil palm 
diseases growth. The collected data of the diseases is plotted into a GIS. The system is 
used to analyze the data and predict the possible spread of the diseases. This prediction 
helps to estimate the cost of the diseases’ treatment, revenue’s losses and expected yield 
after treatment. 
2.3 The Situation Awareness of Agents 
A more advanced approach of a distributed and dynamic e?ort is needed for OPPD risk 
assessment and monitoring. Software agents approach is considered as a potential solu- 
tion as it roots from Distributed Arti?cial Intelligence (DAI) computing systems [19]. 
They signi?cantly contribute and facilitate solutions to many distributed and dynamic 
problems [11]. Research in software agents has progressed over more than a decade due 
to the demands of dynamic and open environments and the complexity of delegated 
tasks since agents are capable of making autonomous decisions and performing goal-directed 
actions in many applications [6]. 
766 S. A. Mostafa et al.
Subsequently, situation awareness capabilities in agents are found to be a very 
e?ective approach to enhancing distributed decisions in dynamic environments. The 
following paragraphs outline a number of attempts to formulate situation awareness in 
agent-based systems. 
Wardzinski [20] emphasize the importance of situation awareness mechanism in 
improving an agent’s knowledge and increasing its decision accuracy, especially, in 
dynamic and uncertain environments. Baader et al. [10] proposed data aggregation, 
semantic analysis, and alert generation layers that correspond accordingly to the percep- 
tion, comprehension and projection phases of situation awareness. Semantic analysis 
layer is concerned with extracting the meaning of situations using ontologies of objects 
and events. The model alerts human actor in the aviation domain about prede?ned situa- 
tions occurrence via a GUI. 
Lili et al. [21] propose situation awareness mechanism to reason about agents’ deci- 
sions. They proposed a Situation Reasoning Module (SRM) that supports the agents’ 
assessment capabilities. Five processes are used in SRM, which are event detection, 
situation cognition, task cognition, performance capacity assessment and integrated 
situation reasoning. Hoogendoorn et al. [22] deploy a mechanism on an agent belief 
optimization in which the agent’s degree of awareness on a situation is signi?ed by an 
activation value of belief. The mechanism aims to generate complex beliefs from the 
observed beliefs and enable the agent to assess future situations. 
Mostafa et al. [5, 11] propose a Situation Awareness Assessment (SAA) technique 
for multi-agent systems. The SAA technique is meant to measure the situation awareness 
of the agents in dynamic and uncertain environments and use the outcomes to manage 
the collaborative decision-making cycle of the agents. It evaluates the agents’ utilities 
from the success of their actions. Subsequently, the SAA applies one of four operational 
states of proceed, halt, block and terminate to the agents run cycle based on their 
performance. 
3 The OPPD-GSA Framework 
This paper proposes an Oil Palm Pests and Diseases Global Situation Awareness (OPPD-
GSA) framework, the aim of which is to assist decision-making parties in monitoring, 
containment and eliminating oil palm pests and diseases. Figure 4 shows the OPPD-
GSA framework. The framework includes teamwork operators of humans and software 
agents. Figure 6 shows the collaborative multi-agent system. The teamwork cooperates 
in performing the OPPD-GSA functions. The framework comprises of four main parts: 
Surveillance, Reporting, Assessment, and Visualization. 
The surveillance part is mainly human-based in which the observations of human 
agents is its key element. The human agents constitute the oil palms’ related agencies, 
o?cers, and farmers that work in the ?eld. Their roles are to detect and report oil palms 
pests (e.g., Bagworms, Nettle caterpillar, and Rat) and diseases (e.g., Ganoderma basal 
stem rot and Marasmius bunch rot) incidences to the system. The human agents have 
an online Oil Palm Pest and Diseases Surveillance (OPPDS) application. They use their 
naked eyes in the pests and diseases surveillance and inspection process. They may use 
A Collaborative Multi-agent System for OPPD 767
imaging systems such as drones in the observation and data collection process. The 
human agents are keying in their ?eld observations using the OPPDS application. The 
OPPDS application transfers the observation data to the Reporting part. 
The Reporting part consists of interface software agents, Report Generation Module 
(RGM), and OPPD guides and observations databases. The aim of this part is to extract 
detailed information about the reported pests or diseases incidences. The interface agents 
interact with the human agents through the OPPDS application and process the obser- 
vations. These agents with the aid of the RGM retrieves the data from the observation 
database and then synthesizes the gathered data and generates OPPD cases and reports. 
The OPPD cases can be accessed and viewed by the assessment software agents, while 
the OPPD reports can be accessed and viewed by the human agent via the OPPDS 
application. A case contains the needed information for the risk assessment process 
including factors of temperature, severity level, type, time, and growth phase, and 
season, while a report contains the detailed information about a particular observation 
including the human agent reporter basic information, infection type, severity, date, 
location and recorded evidence. Additionally, the Reporting part generates individual 
reports and the overall report that show the relations between the reports. 
The Assessment part consists of risk assessment software agents, OPPD cases, and 
assessment criteria. The agents apply the assessment criteria on the OPPD cases and 
dynamically perform distributed risk assessment measures to the OPPD cases. The 
Fig. 4. The OPPD-GSA framework. 
768 S. A. Mostafa et al.
dynamics entail that every assessment cycle is fed as input to the next cycle along with 
the updated OPPD cases. 
The Assessment part outcomes are time-bounded OPPD risk assessment and statuses. 
Finally, the Visualization part uses a web Geographic Information System (GIS) tech- 
nology to view the past, current and expected OPPD risk assessments and statuses. 
3.1 The Situation Awareness 
The principle of involving humans and autonomous agents to carry out some system’s 
initiatives manifests the notion of the intelligent interactive system. Progressively, 
modeling improved agents to develop advanced systems has aroused a great interest in 
agent research and application [21]. Applying situation awareness capabilities in agents 
is one attempt that might provide potential solutions. This work adopts Endsley [23] 
approach of situation awareness formulation which is “the perception of the elements 
in the environment within a volume of time and space, the comprehension of their 
meaning, and the projection of their status in the near future”. This approach suggests 
the phases of sensing, perception, comprehension, and projection to improve decision-making 
and action-performing of systems. Figure 5 shows the correlation between 
observation and situation awareness in an agent’s run cycle. 
Fig. 5. The representation of situation awareness in an agent’s run cycle. 
Subsequently, an agent awareness of an event in an environment is built upon sensing 
the event, perceiving some of its situational elements in a speci?c period of time, under- 
standing the situational elements dynamics of the event, and projecting the understood 
situational elements into the near future [24]. Thus, the agent awareness entails knowl- 
edge interpretation (or belief) and deep analysis to the situational elements of events [22]. 
3.2 The Collaborative Multi-agent System 
A software agent has an active run cycle that reduces the computation time and cost of 
processes and ensures fast and dynamic responses. The OPPD-GSA framework has a 
Collaborative Multi-agent Systems (CMAS) that consists of three groups of agents which 
are human agents, 
aH 
i 
, interface agents, 
aI i 
, and risk assessment agents, 
aA 
i 
. These groups 
communicate and cooperate to perform global risk-assessment tasks. The CMAS has a 
A Collaborative Multi-agent System for OPPD 769
distributed problem-solving structure that provides local and global reasoning and aggre- 
gate decision-making capabilities. A general scheme of the CMAS is shown in Fig. 6. 
Fig. 6. The collaborative multi-agent system. 
The 
aH 
i 
group senses the environment and collects information regarding OPPD 
incidences. They report the OPPD observations to the main system using OPPDS appli- 
cation as explained above. The OPPDS is a geographic information system (GIS) of 
web and Android-based applications that enable human users to deal with and retain 
geolocation data of OPPD in observations database including images, texts and coor- 
dinates. 
The 
aI i 
group receives the OPPD observations, retrieves the related data from the 
OPPD guides and observations databases. Then it synthesizes the data to pi or di cases 
and distributes the cases to the aA 
i 
group. Subsequently, it submits the cases to the Report 
Generation Module (RGM) to generate OPPD reports. 
The 
aA 
i 
group is equipped with statistical methods that use a case’s soft data to 
measure individual and distributed risks. The risk of 
pr i 
or 
dr i 
are ranked to ?ve levels in 
which r = 
{ 
x1 
= {0.0, 0.1, 0.2}, x2 
= {0.3, 0.4}, x3 
= {0.5, 0.6}, x4 
= {0.7, 0.8}, x5 
= {0.9, 1.0} 
} 
. A level 
x1 indicates a very-low risk and a level x5 indicates a very-high or serious risk. Each of 
the OPPD has a static impact value, t, and the impact values have also the range of (0– 
1). The 
aA 
i 
retrieves the related OPPD cases and aggregates their risk levels to generate 
a current view of the 
pr i 
and 
dr i 
. 
Xr n 
?r aggregate 
( 
xr m 
, x?? n 
, µ, ??, 
k 
) 
(1) 
where 
Xr 
is the aggregation of the distributed pests or diseases type; 
n 
is the distribution 
index of which; 
x 
is an individual pest or disease type; 
m 
is a reference to a range of 
distributed cases to be aggregated; µe is the index of risk assessment matrix; ?? is the option 
choice parameter that constrains the risk assessment collaborative decision of the multi-agent 
system, and 
k 
is the aggregation granularity. 
The ?? 
determines the needed number of agents to do the risk assessment. Each agent 
is responsible for applying a particular risk assessment matrix µ. The 
aA 
i 
use the aggre- 
gated risk levels to project future view of the 
pr i 
or 
dr i 
. 
770 S. A. Mostafa et al.
X? 
r 
n 
?? project 
( 
Xr n 
, £, d 
) 
(2) 
where 
X? r 
n 
is the projection of the distributed 
pr i 
or 
dr i 
risk levels in the 
d 
duration of time 
and 
£ 
is the projection metric. 
4 The Implementation and Results 
The OPPD-GSA framework is implemented in a web application. The implementation 
platforms are Java, Jada, HTML, JavaScript and Google Maps JavaScript API. 
Figure 7 shows a web page of the application that views an observation. The marker 
indicates the observation location and its color indicates its OPPD status. 
Fig. 7. The OPPD-GSA application and information layers. 
The OPPD-GSA web application has an open street map with grids and coordinates. 
The Google Maps API is customized with an enhanced viewer and 10 m grid appears. 
It also has di?erent map layers with a variety of overlays, such as polylines, markers, 
polygons along with their location data as shown in Fig. 7. 
The layers visualize the boundary of the oil palm lands and estates and their OPPD 
statuses. The risk levels are represented by ?ve colors in which red denotes very high 
risk, green denotes very low risk and the three other levels are in between the two. 
A Collaborative Multi-agent System for OPPD 771
The CMAS setting includes the factors of risk levels, risk impact, risk assessment 
matrices and their index, the aggregation granularity and the projection duration. Some 
of these factors can be customized by human users and others are automatically con?g- 
ured by the agents based on their analysis of the observations data. Table 1 shows the 
setting options of the system. 
Table 1. The options of the system setting 
Type Option 
Description Low range High range 
xr 
The risk levels 1 5 
x?? 
The impact of the risk 0 1 
µh 
The risk assessment matrices 1 4 
dh 
The index of the risk assessment 
matrices 
1 2 
k 
The aggregation granularity 1 3 
d 
The projection duration (month) 1 6 
Data collection has o?ine and online phases. The o?ine phase is conducted to gather 
OPPD analysis data. The sources of this data are the literature, e.g., [2, 18], and oil palm 
agencies in Malaysia. This data is used to build the OPPD guides and the assessment 
criteria. The online phase is conducted during system operation to gather the data of the 
OPPD observations. 
The OPPD-GSA framework is experimentally tested using observation samples for 
OPPD of ?ve areas in Malaysia. The instigated OPPD in the tests are Oryctes rhinoceros 
beetles, 
pr i 
and Ganoderma, 
dr i 
in Malaysia OPPD. The tests evaluate the dynamics of 
the data ?ow in the system and the accuracy of the OPPD risk assessment. Table 2 shows 
the corresponding preliminary result of the tests. 
Table 2. The preliminary results 
r 
?h 
r 
Area pr 1 
dr 1 
p?? 1 
d?? 1 
Xr i 
Level 
1 2 3 
1 0.30 0.20 0.65 0.85 0.183 x1 1 
x1 1 
x2 1 
x2 1 
2 0.10 0.10 0.65 0.85 0.075 x1 2 
x1 2 
x1 2 
x1 2 
3 0.10 0.40 0.65 0.85 0.203 x2 3 
x2 3 
x2 3 
x2 3 
4 0.20 0.20 0.65 0.85 0.150 x1 4 
x1 4 
x1 4 
x1 4 
5 0.10 0.20 0.65 0.85 0.118 x1 5 
x1 5 
x1 5 
x1 5 
Overall 0.16 0.22 0.65 0.85 0.111 X1 X1 X1 X1 
The test results con?rm that the application is able to process the OPPD monitoring 
in real-time and handle Geo-located visualization data. The assessment results show that 
there is a considerably very low-risk level of Oryctes rhinoceros beetles, 
pr i 
, and Gano- 
derma, dr i 
in the ?ve areas. The projection of three months shows the same results except 
in area one. The overall OPPD risk is also very low. 
772 S. A. Mostafa et al.
This research provides an integrated solution to OPPD detection, surveillance and 
elimination. It supports a synergized e?ort between regional agencies and leverages the 
expertise and resources even in countries facing geographic barriers where the infor- 
mation ?ow is not current or even readily available. Consequently, the proposed OPPD-
GSA framework is critical for directing control e?orts, developing control tools, and 
strategizing decision-making process. It is meant to: 
• Enhance the surveillance capabilities by means of real-time intelligent data sharing 
and coordination across borders, with improved information speed of access, accu- 
racy, and quality. 
• Provide a comprehensive analysis and reports on the collected observations. 
• Broadcast each local case’s severity and project a global risk view of the cases for 
the authorities to respond in an e?cient manner. 
5 Conclusions and Future Work 
This paper proposes a framework of Collaborative Multi-agent System (CMAS) for Oil 
Palm Pests and Diseases Global Situation Awareness (OPPD-GSA). An application of 
the OPPD-GSA framework is implemented for monitoring and controlling the OPPD 
via applying risk assessment measures. The OPPD-GSA is experimentally tested by a 
sample OPPD observation data of Oryctes rhinoceros beetles and Ganoderma within 
?ve areas in Malaysia. The test results show that there is a considerably very low-risk 
level of Oryctes rhinoceros beetles and Ganoderma in the ?ve areas. The projection of 
three months shows the same results except in area one. The overall OPPD risk assess- 
ment is also considered very low. The novel ideas of the system formalization can be 
used to serve other similar emergency management systems such as pollution, ?re and 
?ood monitoring and control. 
Acknowledgment. This project is sponsored by the postdoctoral grant of Universiti Tun Hussein 
Onn Malaysia (UTHM) under Vot D004 and partially supported by the Tier 1 research grant 
scheme of UTHM under Vot U893. 
References 
1. Ramle, M., Wahid, M.B., Norman, K., Glare, T.R., Jackson, T.A.: The incidence and use of 
Oryctes virus for control of rhinoceros beetle in oil palm plantations in Malaysia. J. Invertebr. 
Pathol. 89(1), 85–90 (2005) 
2. Foster, W.A., Snaddon, J.L., Turner, E.C., Fayle, T.M., Cockerill, T.D., Ellwood, M.F., 
Yusah, K.M.: Establishing the evidence base for maintaining biodiversity and ecosystem 
function in the oil palm landscapes of South East Asia. Phil. Trans. R. Soc. B 366(1582), 
3277–3291 (2011) 
3. Murphy, D.J.: Future prospects for oil palm in the 21st century: biological and related 
challenges. Eur. J. Lipid Sci. Technol. 109(4), 296–306 (2007) 
A Collaborative Multi-agent System for OPPD 773
4. Liaghat, S., Ehsani, R., Mansor, S., Shafri, H.Z., Meon, S., Sankaran, S., Azam, S.H.: Early 
detection of basal stem rot disease (Ganoderma) in oil palms based on hyperspectral 
re?ectance data using pattern recognition algorithms. Int. J. Remote Sens. 35(10), 3427–3439 
(2014) 
5. Mostafa, S.A., Ahmad, M.S., Tang, A.Y., Ahmad, A., Annamalai, M., Mustapha, A.: Agent’s 
autonomy adjustment via situation awareness. In: Intelligent Information and Database 
Systems, pp. 443–453. Springer, Cham (2014) 
6. Andreadis, G., Bouzakis, K.D., Klazoglou, P., Niwtaki, K.: Review of agent-based systems 
in the manufacturing section. Univers. J. Mech. Eng. 2(2), 55–59 (2014) 
7. Durand, B., Godary-Dejean, K., Lapierre, L., Crestani, D.: Inconsistencies evaluation 
mechanisms for a hybrid control architecture with adaptive autonomy. In: CAR: Control 
Architectures of Robots (2009) 
8. Mostafa, S.A., Ahmad, M.S., Ahmad, A., Annamalai, M., Gunasekaran, S.S.: A ?exible 
human-agent interaction model for supervised autonomous systems. In: 2016 2nd 
International Symposium on Agent, Multi-Agent Systems and Robotics (ISAMSR), pp. 106– 
111. IEEE, Putrajaya (2016) 
9. Shafri, H.Z., Hamdan, N.: Hyperspectral imagery for mapping disease infection in oil palm 
plantationusing vegetation indices and red edge techniques. Am. J. Appl. Sci. 6(6), 1031 
(2009) 
10. Baader, F., Bauer, A., Baumgartner, P., Cregan, A., Gabaldon, A., Ji, K., Schwitter, R.: A 
novel architecture for situation awareness systems. In: International Conference on 
Automated Reasoning with Analytic Tableaux and Related Methods, pp. 77–92. Springer, 
Heidelberg (2009) 
11. Mostafa, S.A., Ahmad, M.S., Annamalai, M., Ahmad, A., Gunasekaran, S.S.: Formulating 
dynamic agents’ operational state via situation awareness assessment. In: Advances in 
Intelligent Informatics, pp. 545–556. Springer, Cham (2015) 
12. Flood, J., Bridge, P.D., Holderness, M.: Ganoderma Diseases of Perennial Crops. CABI, New 
York (2000) 
13. Panchal, G., Bridge, P.D.: Following basal stem rot in young oil palm plantings. 
Mycopathologia 159(1), 123–127 (2005) 
14. Bridge, P.D., O’Grady, E.B., Pilott, C.A., Sanderson, F.R.: Development of molecular 
diagnostics for the detection of Ganoderma isolates pathogenic to oil palm. In: Flood, J., 
Bridge, P.D., Holderness, M. (eds.) Ganoderma Diseases of Perennial Crops, pp. 225–234. 
CAB International, Wallingford (2000) 
15. Mohammed, C.L., Rimbawanto, A., Page, D.E.: Management of basidiomycete root-and 
stem-rot diseases in oil palm, rubber and tropical hardwood plantation crops. Forest Pathol. 
44(6), 428–446 (2014) 
16. Liaghat, S., Balasundram, S.K.: A review: the role of remote sensing in precision agriculture. 
Am. J. Agric. Biol. Sci. 5(1), 50–55 (2010) 
17. Woittiez, L.S., van Wijk, M.T., Slingerland, M., van Noordwijk, M., Giller, K.E.: Yield gaps 
in oil palm: a quantitative review of contributing factors. Eur. J. Agron. 83, 57–77 (2017) 
18. Idris, A.S., Mior, M.H.A.Z., Wahid, O., Kushairi, A.: Geostatistics for monitoring Ganoderma 
outbreak in oil palm plantations. MPOBTS Information Series 74 (2010) 
19. Byrski, A., Drezewski, R., Siwik, L., Kisiel-Dorohinicki, M.: Evolutionary multi-agent 
systems. Knowl. Eng. Rev. 30(2), 171–186 (2015) 
20. Wardzinski, A.: The role of situation awareness in assuring safety of autonomous vehicles. 
In: International Conference on Computer Safety, Reliability, and Security, pp. 205–218. 
Springer, Heidelberg (2006) 
774 S. A. Mostafa et al.
21. Lili, Y., Rubo, Z., Hengwen, G.: Situation reasoning for an adjustable autonomy system. Int. 
J. Intell. Comput. Cybern. 5(2), 226–238 (2012) 
22. Hoogendoorn, M., Van Lambalgen, R.M., Treur, J.: Modeling situation awareness in human-like 
agents using mental models. IJCAI Proc. Int. Jt. Conf. Artif. Intell. 22(1), 1697–1704 
(2011) 
23. Endsley, M.R.: Situation awareness global assessment technique (SAGAT). In: Aerospace 
and Electronics Conference NAECON National, pp. 789–795. IEEE (1988) 
24. Mostafa, S.A., Ahmad, M.S., Ahmad, A., Annamalai, M.: Formulating situation awareness 
for multi-agent systems. In: 2013 International Conference on Advanced Computer Science 
Applications and Technologies, pp. 48–53. IEEE, Kuching (2013) 
A Collaborative Multi-agent System for OPPD 775
Using Mouse Dynamics for Continuous User 
Authentication 
Osama A. Salman(&) and Sarab M. Hameed 
Computer Science Department, College of Science, 
University of Baghdad, Baghdad, Iraq 
{ausama_cnc,sarab_majeed}@yahoo.com 
Abstract. This paper suggests a new model for user authentication based on 
mouse dynamics. The proposed model utilizes a neural network to identify user 
behavior and Gaussian Naïve Bayes classi?er is applied for classi?cation pur-pose 
and assessing the ability of the proposed model to distinguish between 
genuine and imposter user. The performance of the proposed model is examined 
on a dataset of 48 users. The results prove that the proposed model outperforms 
other models in all evaluation metrics including, accuracy, false accept rate, 
false reject rate and error equal rate. 
Keywords: Behavioral biometricsContinuous authentication 
Gaussian Naïve BayesMouse dynamicsNeural network 
1 Introduction 
An increasing interest in the research on mouse dynamics based user authentication has 
been growing because there is no need of any speci?c hardware to collect biometric 
data. Mouse dynamics include monitoring the user behavior through how he/she 
interacts with the mouse as a means of authentication [1]. 
Unfortunately, most present computer systems involve authenticating a user just at 
the beginning of login session. However, there is a crucial security flaw in some cases, 
when the user leaves the computer for a period of time when it is unlocked. Conse-quently, 
the attacker accesses the system resources and steal con?dential information. 
To deal with this problem systems require continuous user authentication in which the 
user is continuously authenticated. 
Several attempts in the literature have been suggested to address the problem of 
continuous user authentication. Ahmed and Traore in 2010 [2] introduced the mouse 
dynamics biometric concepts and presented a detector to gather and process mouse 
movements. In addition, various factors adopted to form user signature were consid-ered. 
Testing of the detector was performed on the dataset with 48. The proposed 
detector result achieves false acceptance rate 
ðFARÞ 
equal 2.6052% and false rejection 
rate 
ðFRRÞ 
equal 2.506%. Zheng et al. in 2011 [1] proposed an approach for user 
authentication via mouse dynamics. In this approach, angle-based metrics are extracted 
which unique for each user and platform independent and Support Vector Machine 
ðSVMÞ 
was used for veri?cation. Experiments were conducted on two datasets. The 
?rst one involved thirty users and the second one consists of one thousand users. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 776–787, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_58
The performance of the approached was evaluated in terms of equal error rate 
ðEERÞ 
and the obtained result was 1.3%. Feher et al. in 2012 [3] introduced a method for 
continuously verifying users based on individual mouse action. They extracted new 
features form a hierarchy of mouse actions. These new features are injected with 
previous work’s features. Furthermore, a multi-class classi?er was utilized to verify 
user identity. The evaluation was performed using a dataset collected from different 
users. Results showed a signi?cant enhancement in the accuracy when applying the 
newly injected features. Shen et al. in 2014 [4] presented a study the performance of 
anomaly detection algorithms based on mouse dynamics. The evaluation was per-formed 
on a dataset containing 17,400 samples from ?fty-?ve users and seventeen 
detectors were applied. The results show that the six top-performing detectors produce 
ERR between 8.81% and 11.63%. Mondal and Bours in 2015 [5] presented a study 
regarding the performance of continuous authentication using mouse dynamics. They 
use weighted fusion scheme, score boost, static trust model and dynamic trust model 
for analyzing and SVM and ANN as a classi?er. The evaluation was done on a dataset 
that includes the mouse dynamics data obtained from forty-nine users. The results 
showed signi?cant improvement over the beforehand performance results on the same 
dataset. Mondal and Bours in 2016 [6] introduced a new a technique based on pairwise 
user coupling for identi?cation and continuous user authentication. They build a dataset 
that contains a combination of the data behavior of keystroke and mouse dynamics. 
The accuracy result is 62.2% and the detection rate is 58.9%. Lu et al. in 2017 [7] 
proposed an authentication approach using mouse movement and eye movement 
tracking. Two neural networks were used for multi-class classi?cation and binary 
classi?cation. In addition, the regression model with fusion was used for classi?cation 
purpose. The performance of the proposed approach was evaluated on a dataset col-lected 
from forty users. The results clarify that coupling eye tracking with the mouse 
dynamics are applicable for authentication. 
In this paper, the problem of continuous user authentication is considered through 
continuously analyzing the user’s mouse movements to obtain active and continuous 
authentication. 
In what follows, we ?rst briefly describe the mouse dynamics. Then, in Sect. 3, we 
introduce the proposed mouse dynamics user authentication model. The results of the 
proposed model are then evaluated in Sect. 4. Finally, Sect. 5 concludes the current 
work and hints some further rami?cations. 
2 Mouse Dynamics 
Mouse dynamics is considered as an example of behavioral biometric. The key 
strengths of the mouse dynamic compared with other biometric technologies is that it 
enables monitoring the user dynamically and passively. Accordingly, it can be utilized 
to track continuously genuine and imposter users during computing sessions [8]. The 
mouse actions can be classi?ed as silence action that denotes no movement and 
movement activities. Movements of the mouse can involve movement type, movement 
speed, traveled distance and movement direction. Movement type contains Mouse-
Move 
ðMMÞ 
action, Drag-and-Drop 
ðDDÞ 
action and Point-and-Click 
ðPCÞ 
action [2]. 
Using Mouse Dynamics for Continuous User Authentication 777
Movement direction can be speci?ed by an angle or by eight directions numbered 
from one to eight. Each number comprises a set of mouse movements done in a 45- 
degree. As an example, direction number one describes all actions done with angles 
between zero-degree and 45-degree, while direction number two represents any actions 
done with angles between 45-degree and 90-degree [5]. 
3 The Proposed Mouse Dynamics Model 
The proposed mouse dynamics model (coined as MD) introduces new features con-structed 
from the properties of mouse movement to observe the user behavior. The 
main components of the proposed model can be recapped into two phases. These are 
preprocessing phase and classi?cation phase. A signi?cant part of the proposed model 
is feature extraction process. The main effort in the proposed model is to observe the 
behavior of the user and analysis it to represent the user by a set of features that 
characterize the behavior of the user and capable to discriminate genuine user from 
imposter user. A neural network, histogram and average metric are utilized for 
extracting the mouse feature set that characterizes the behavior of the user. 
3.1 Description of Mouse Raw Data 
In this research, the dataset developed in [2] is used. This dataset includes the mouse 
dynamics data of 998 sessions collected from 48 users. The collected data contains 
mouse activities. Each activity holds the characteristics of an intercepted mouse 
movement. The collected data contains four main mouse activities as described in what 
follows: 
1. Action type: the action type takes four values 1, 2, 3 and 4 for mouse move ðMMÞ, 
silence, point, click 
ðPCÞ 
and drag, and drop ðDDÞ, respectively. 
2. Traveled distance in pixels (d). 
3. Elapsed time in seconds (t). 
4. Movement direction takes eight values (1 to 8) according to the mouse movement. 
3.2 Noise Reduction 
The collected mouse raw data from each user have different ranges depending on an 
environment setting and the accuracy of mouse dynamics modeling can be affected by 
nature of the data. Two types of ?ltering are used in this thesis. The ?rst ?lter regarding 
the distance while the second one regarding the speed. In the ?rst ?lter, only the mouse 
data have distance within zero value and the distance value greater than 25 and less 
than 1200 are considered while in the second ?lter all users speed greater than 800 are 
eliminated. 
3.3 Mouse Dynamics Features Extraction 
The purpose of mouse dynamics features extraction process is to ?nd the distinguishing 
features that form the user’s behavior. The raw mouse data can be utilized to construct 
778 O. A. Salman and S. M. Hameed
a mouse dynamics signature for a user. A mouse dynamics signature is constructed by 
introducing new features, which is utilized in conjunction with features presented in 
[2]. The process of extracting feature set F is carried out through proposing a model 
based on neural network and histogram. To construct the feature set F that determining 
user’s behavior, the mouse dynamic raw data for each user was divided into a number 
of mouse actions called sessions. To characterize the behavior of a user of each session, 
a number of features are extracted. These feature sets result from a combination of new 
features introduced in this paper with the features of [2]. The features of [2] are 
movement direction histogram 
ðMDHÞ 
denoted by eight values, action type histogram 
ðATHÞ 
denoted by three values, traveled distance histogram (TDH denoted by two 
values), movement elapsed time histogram 
ðMTHÞ 
denoted by three values, average 
movement speed per action type 
ðATAÞ 
denoted by three values and average movement 
speed for each direction 
ðMDAÞ 
represented by eight values. 
The new features are introduced via a neural network to model the user behavior 
from the mouse raw data. Backpropagation neural network is utilized to introduce a 
new feature that de?nes the user behavior from mouse dynamics as a curve approxi-mate 
to user-collected data. Authentication model involves neural network training to 
learn the acceleration that should be within the mouse dynamics. This is accomplished 
through neural network training with a speed and distance of mouse dynamics. First, 
the speed (s) is calculated for each action as the ratio of the traveled distance divided by 
the time of that action as in (1). 
8i 
¼ 
1; ... nr; si 
¼ 
di 
ti 
ð1Þ 
nr: is the total no. of actions in the raw mouse data 
In addition, the acceleration is calculated depending on the time and the speed in 
the mouse raw data. Then, to train the neural network, there is a need to feed it with the 
inputs: time (t) and the speed (s) for each user session. The output of the neural network 
represents the acceleration for a particular input. The structure of neural network 
consists of the input layer with two neurons that express the time and the speed. The 
hidden layer includes ?ve neurons and there is one neuron at the output layer that 
represents the acceleration of the corresponding inputs. Two activation functions 
including hyperbolic tangent sigmoid and linear are used. The hyperbolic tangent 
sigmoid function was used for the hidden neurons and linear function for the output 
neuron. 
The trained backpropagation neural network is used to investigate the behavior of 
the user. To model the behavior the user, ?rst, for each testing session, the minimum 
and maximum values of the speed and the time ðsminÞ; ðsmaxÞ; 
ðtminÞ 
and ðtmaxÞ, 
respectively should be found. Then, twelve values of the speed and the time are 
extracted using (2) and (3). 
Using Mouse Dynamics for Continuous User Authentication 779
s0 
¼ 
smin 
t0 
¼ 
tmin 
8i; 1 i i i 11 
si 
¼ 
ðsmax 
i sminÞ 
11 
þ si1 
ð2Þ 
ti 
¼ 
ðtmax 
i tminÞ 
11 
þ ti1 
ð3Þ 
tmin: is the minimum value t can get, 
tmin: is the maximum value t can get. 
smin: is the minimum value s can get, 
smin: is the maximum value s can get. 
The extracted twelve values of the speed and the time that represent input values are 
propagated forward to the network to calculate the net output. The weighted sum of the 
inputs to the neuron is calculated. Then, the bias value is added to the sum and ?nally, 
the activation function for the neuron (i.e. the desired output) is calculated. The 
extracted twelve values that represent the acceleration are inverted (coined as AST) to 
obtain a distinct separable within users behavior. 
3.4 Normalization 
After extracting the features from mouse raw data that describe the user behavior, F 
now can be represented by l distinct features for each session. Therefore, the dataset of 
mouse dynamics 
F 
can be formally described as: 
F 
¼ 
fF1; F2; ...; Fng; 
where 
n is the total number of sessions in F: 
Each session Fk 
2 
F 
can be expressed as follows: 
8k 
2 
f1; .. .; ng 
Fk 
¼ 
ffk1; fk2; ...; fklg 
fkl : determines whether Fk is legitimate user or imposter user as formulated in what 
follows: 
fkl 
¼ 
1 if Fkis a genuine user 
0 otherwise 
f 
780 O. A. Salman and S. M. Hameed
The values of extracted features have different ranges; therefore, the features are set 
in a uniform range to avoid some features’ domination over others. The features are 
scaled linearly to the range [–1, 1]. 
3.5 Classi?cation Phase 
The role of classi?cation stage in the proposed user authentication model is to cate-gorize 
a user behavior as either genuine or an imposter. The extracted feature resulted 
from preprocessing stage are used as the input to this stage. Gaussian Naïve Bayes 
classi?er is utilized to show the ability of the proposed model to recognize user 
behavior as a genuine or an imposter. Gaussian NB is used according to its ability to 
generate the probability of features by scanning a training data only once, which makes 
the task of classi?cation to be straightforward. 
Gaussian NB classi?er contains two stages: learning stage and testing stage. The 
learning stage aims to estimate the prior probability of genuine class and imposter class 
and the probability of predictor given class. On the other hand, the goal of the testing 
stage is to categorize the user behavior into either genuine or an imposter. 
In learning stage, Gaussian NB is trained with features extracted from data pre-processing 
phase, given feature vectors 
F 
¼ 
fF1; F2; ...; Fng 
and their corresponding 
labels C 
¼ 
f0; 1g, the prior probability P cj 
j j 
; cj 
2 
C; 8j 
2 
f1; 2g, is calculated as the 
frequency of user behavior belongs to cj divided by the total number of user behavior in 
training dataset as in Equ. (4). 
P cj 
j j 
¼ 
Pn 
k¼1 
ui 
n 
; cj 
2 
C; 8j 
2 
f1; 2g ð4Þ 
Where 
uk 
¼ 
1 if ck 
¼ 
cj 
0 otherwise 
j 
Estimating the distribution of the feature of the given class is achieved by calcu-lating 
the mean lj and variance r2 j 
of feature Fi as in (5) and (6), respectively. 
lij 
¼ 
Pn 
k¼1 
fkiuk 
Pn 
k¼1 
uk 
ð5Þ 
r2 ij 
¼ 
Pn 
k¼1 
ðfki 
¼ 
lijÞ2 uk 
Pn 
k¼1 
uk 
ð6Þ 
Using Mouse Dynamics for Continuous User Authentication 781
where 
uk 
¼ 
1 if ck 
¼ 
cj 
0 otherwise 
j 
In the testing stage, the prior probability and mean and variance of each feature 
resulted from the learning phase are used as input to the classi?cation phase. Then, for 
each of feature vector in testing dataset 
F0 ¼ F0 1; 
...; F0 nt 
t t 
; the posterior probability of 
each class cj 
2 
C; 8j 
2 
f1; 2g 
is computed as in (7). 
P 
cjjF0 i 
) ) 
¼ 
P cj 
j j 
j 
Y 
nt 
k¼1 
PDF 
F0 i 
0 0 
ð7Þ 
where 
PDF is the probability density function that is computed as in (8). 
PDF 
F0 i 
0 0 
¼ 
1 
????????ity      
2pr2 
??F  
ij 
q 
e 
1 2 
ðf 
0 
ijmij 
r 2 
ij 
Þ2 
ð8Þ 
Lastly, a label is assigned to a user represented by feature vector Fi after computing 
the two posterior probabilities. The user Fi is categorized as a genuine when the 
posterior probability of c1 higher than the posterior probability of c2. Otherwise, Fi will 
be classi?ed as an imposter. 
4 Experimental Results 
The performance of the proposed model MDD is evaluated. The evaluation is presented 
in terms of Acc; FAR and FRR. Backpropagation algorithm is adopted for identifying 
the user behavior. The results are obtained by setting the parameters of the neural 
network as follows: 
1. A number of epochs is set for 1000. 
2. Learning rate g 
¼ 
0:01. 
3. Error rate . 
¼ 
0:001. 
Furthermore, 3-fold cross-validation approach is used for testing the proposed 
mouse dynamics model. 
782 O. A. Salman and S. M. Hameed
4.1 Session Length Setting 
The dataset of [2] consists of 48 users, each user has a different number of actions. In 
the proposed work, ?ve different settings for the length of session Slen 
¼ 
f500; 1000; 1500; 2000; 2500g 
are adopted to show the impact of session length on the 
ability of the proposed user authentication model in discriminate among users. The 
session length represents the number of actions required to complete a session. Fig-ure 
1 depicts the number of session per user while varying session length. 
4.2 Evaluation of MD Model 
This section illustrates the contribution of the newly introduced features by the com-paring 
the performance of the proposed models against [2] regarding accuracy ðAccÞ, 
false reject rate ðFRRÞ, false accept rate ðFARÞ, error equal rate 
ðERRÞ 
and area under 
curve ðAUCÞ. Tables 1 and 2 quantitatively report the comparison of the proposed 
model against [2]. 
5 10 15 20 25 30 35 40 45 
User Number 
0 
50 
100 
150 
200 
250 
300 
350 
400 
450 
Number of Session 
500 1000 1500 2000 2500 
Fig. 1. Number of sessions for each user. 
Using Mouse Dynamics for Continuous User Authentication 783
The results show that the session length effects on the performance of user 
authentication model. Increasing session length provides the proposed models with 
more information for constructing user signature form mouse dynamics that enhance 
the performance of the user authentication models in terms of accuracy and minimizing 
FAR and FRR. The proposed MD model has the same number of features as in [2]. The 
results reported in Tables 1 and 2 clarify the high performance of MD model in all 
evaluation metrics compared to [2]. This is evidence that the contribution of the new 
AST features to the accuracy of the user authentication model and the appropriateness 
of inclusion of AST features to characterize the activity of the mouse. 
ROC Figs. 2, 3 and 4 qualitatively depict the comparison of proposed models for 
each fold within session length equals to 1500 against [2]. The ROC ?gures demon-strate 
that the proposed model outperforms [2]. 
Table 1. Comparison of proposed model in terms of against [2] in terms of Acc; FRR and FAR 
MD model [2] 
Session length Fold # Acc% FRR FAR Acc% FRR FAR 
500 1 91.397 0.2 0.079 89.853 0.288 0.09 
2 91.029 0.266 0.079 91.324 0.228 0.078 
3 90.588 0.266 0.084 90.956 0.316 0.077 
Avg. 90.931 0.248 0.081 90.809 0.274 0.081 
1000 1 91.801 0.175 0.076 89.751 0.162 0.099 
2 92.105 0.22 0.07 89.62 0.279 0.092 
3 92.69 0.214 0.064 89.912 0.209 0.094 
Avg. 92.541 0.205 0.066 89.81 0.223 0.094 
1500 1 92.375 0.05 0.077 91.939 0.121 0.077 
2 93.682 0.103 0.06 91.721 0.1 0.082 
3 92.593 0.108 0.071 88.235 0.13 0.117 
Avg. 92.884 0.079 0.07 90.924 0.127 0.088 
2000 1 91.908 0.1 0.08 92.197 0.208 0.068 
2 93.66 0.087 0.062 89.049 0.091 0.111 
3 93.931 0.083 0.059 92.197 0.238 0.068 
Avg. 92.59 0.128 0.072 91.528 0.118 0.082 
2500 1 92.473 0.063 0.076 92.115 0.053 0.081 
2 93.571 0.125 0.059 91.429 0.15 0.081 
3 93.548 0.176 0.057 93.19 0.056 0.069 
Avg. 93.563 0.822 0.009 92.241 0.087 0.077 
784 O. A. Salman and S. M. Hameed
Table 2. Comparison of proposed model in terms of against [2] in terms of ERR and AUC 
MD model [2] 
Session length Fold # ERR AUC ERR AUC 
500 1 0.175 0.941 0.175 0.915 
2 0.152 0.927 0.177 0.915 
3 0.142 0.927 0.177 0.904 
Avg. 0.156 0.932 0.176 0.911 
1000 1 0.086 0.971 0.111 0.953 
2 0.122 0.954 0.122 0.943 
3 0.118 0.954 0.123 0.957 
Avg. 0.109 0.96 0.119 0.951 
1500 1 0.071 0.983 0.091 0.97 
2 0.088 0.974 0.1 0.96 
3 0.073 0.975 0.106 0.951 
Avg. 0.077 0.977 0.099 0.96 
2000 1 0.089 0.984 0.042 0.99 
2 0.087 0.981 0.091 0.97 
3 0.083 0.977 0.086 0.979 
Avg. 0.086 0.981 0.073 0.98 
2500 1 0.063 0.982 0.058 0.984 
2 0.086 0.976 0.123 0.968 
3 0.092 0.981 0.056 0.976 
Avg. 0.08 0.98 0.079 0.976 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
FRR 
0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1 
TPR=1-FAR 
[2] 
MD Model EER 
Fig. 2. ROC of the proposed models against [2] for fold number 1. 
Using Mouse Dynamics for Continuous User Authentication 785
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
FRR 
0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1 
TPR=1-FAR 
[2] 
MD Model EER 
Fig. 3. ROC of the proposed models against [2] for fold number 2. 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
FRR 
0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1 
TPR=1-FAR 
[2] 
MD Model EER 
Fig. 4. ROC of the proposed models against [2] for fold number 3. 
786 O. A. Salman and S. M. Hameed
5 Conclusions 
This paper proposes a continuous user authentication model based on mouse dynamics. 
The proposed model couples neural network and statistical metrics to extract a valuable 
feature from mouse movement activities that able to distinguish user as genuine or an 
imposter. The neural network is utilized to identify the user behavior in a new manner 
from mouse acceleration as a curve. Comparison results ensure that the proposed model 
outperforms the related model considered in the literature in all evaluation metrics. As 
scope for future work is to employ silence action from mouse dynamics to model the 
user behavior. 
References 
1. Zheng, N., Paloski, A., Wang, H.: An ef?cient user veri?cation system via mouse movements. 
In: Proceedings of The 18th ACM Conference on Computer and Communications Security - 
CCS 2011, Chicago, 17–21 October 2011 
2. Ahmed, A.A.E., Traore, I.: Mouse dynamics biometric technology. In: Klinger, K., Snavely, 
J. (Eds.) Behavioral Biometrics for Human Identi?cation: Intelligent Applications, pp. 207– 
223 (2010) 
3. Feher, C., Elovici, Y., Moskovitch, R., Rokach, L., Schclar, A.: User identity veri?cation via 
mouse dynamics. Inf. Sci. 201, 19–36, (2012) 
4. Chao, S., Zhongmin, C., Xiaohong, G., Roy, M.: Performance evaluation of anomaly-detection 
algorithms for mouse dynamics. Comput. Secur. 45, 156–171 (2014) 
5. Mondal, S., Bours, P.: A computational approach to the continuous authentication biometric 
system. Inf. Sci. 304, 28–53 (2015) 
6. Mondal, S., Bours, P.: Combining keystroke and mouse dynamics for continuous user 
authentication and identi?cation. In: 2016 IEEE International Conference on Identity, 
Security and Behavior Analysis (ISBA), Sendai, Japan, 26 May 2016 
7. Lu, H., Rose, J., Liu, Y., Awad, A., Hou, L.: Combining mouse and eye movement biometrics 
for user authentication. In: Information Security Practices. Springer, Cham, pp. 55–71 (2017) 
8. Shen, C., Cai, Z., Guan, X., Du And, Y., Maxion, R.A.: User authentication through mouse 
dynamics. IEEE Trans. Inf. Forensics Secur. 8(1), 16–30 (2013) 
Using Mouse Dynamics for Continuous User Authentication 787
Ten Guidelines for Intelligent Systems Futures 
Daria Loi(&) 
Intel Corporation, Hillsboro, OR, USA 
daria.a.loi@intel.com 
Abstract. Intelligent systems – those that leverage the power of Arti?cial 
Intelligence (AI) – are set to transform how we live, travel, learn, relate to each 
other and experience the world. This paper details outcomes of a global study, 
where a multi-pronged methodology was adopted to identify people’s percep-tions, 
attitudes, thresholds and expectations of intelligent systems and to assess 
their perspectives toward concepts focused on bringing such systems in the 
home, car, and workspace. After background details grounding the study’s 
rationale, the paper ?rst outlines the research approach and then summarizes key 
?ndings, including a discussion on how people’s knowledge of intelligent 
systems impacts their understandings of (and willingness to embrace) such 
systems; an overview of the domino effect of smart things; an outline of people’s 
concerns with, flexibility toward and need to maintain control over intelligent 
systems; and a discussion of people’s preference for helper usages, as well as 
insights on how people view Affective Computing. Ten design guidelines that 
were informed by the study ?ndings are outlined in the fourth section, while the 
last part of the paper offers conclusive remarks, alongside open questions and a 
call for action that focuses on designers’ and developers’ moral and ethical 
responsibility for how intelligent systems futures are being and will be shaped. 
Keywords: Intelligent systems
.e
Design guidelines
.e
Ethics of AI 
1 Introduction 
This paper discusses outcomes of a global study, in which a multi-pronged method-ology 
was adopted to identify people’s perceptions, attitudes, thresholds and expec-tations 
of intelligent systems and to assess their perspectives toward concepts focused 
on bringing such systems into the home, car, and workspace. The paper is divided into 
?ve sections. Background details to ground the study’s rationale are ?rst offered, 
followed by an outline of the study approach. Key study ?ndings are summarized in the 
third section of the paper and the fourth highlights ten design guidelines that were 
informed by the study ?ndings. Conclusive remarks are ?nally offered, including open 
questions and a call for action. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 788–805, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_59
2 Background 
Intelligent systems – those that leverage the power of Arti?cial Intelligence (AI) – are 
set to transform how we live, travel, learn, relate to each other and experience the 
world. While these systems so far proved bene?cial through scripted automation and 
transactions (e.g. sensor-based factory automation or phone-based health or ?nancial 
transactions), serious challenges emerge when they are designed to have unscripted, 
autonomous, active roles. Challenges increase further when these systems include 
Affective Computing abilities [1] or become integral part of the environments we 
inhabit daily (e.g. home, of?ce, school, vehicle). As these systems continue to be 
developed, overlapping concerns are accelerating and becoming mainstream – from 
fears of jobs replacement [2] to the emergence of surveillance [3] or deeply unequal 
societies [4], to name a few. At the core of such concerns is the realization that AI and 
intelligent systems may challenge, if not threaten, the fundamentals of human and 
social behaviour and the very foundations of our society. As Bostrom and Yudkowsky 
[5] point out, “although current AI offers us few ethical issues that are not already 
present in the design of cars or power plants, the approach of AI algorithms toward 
more humanlike thought portends predictable complication.” 
The fascinating part of the AI debate is that negative focus is often directed to the 
systems, as if they had the ability to come into existence autonomously. While AI will 
be able to independently design and develop another AI [6], as of now intelligent 
systems have one common trait: they are designed by people – typically data scientists, 
often assisted by designers, social scientists, and business experts that make decisions 
on what to design, how, why and what data to feed into systems, making them smart 
over time, training them. An challenging aspect of such a process is that people are not 
perfect nor fully predictable – in other words, not only human biases may play a key 
role while systems are designed, human imperfections (i.e. traits of humanity) may also 
have repercussions on systems themselves, once they start interacting with the world. 
A good example is offered by Tay, a chatbot designed as an experiment to “ex-periment 
with and conduct research on conversational understanding” [7] capable of 
getting smarter by engaging with people in casual conversations. In less than 24 h from 
its launch, Tay was morally corrupted (and subsequently shut down) as its users did 
something not accounted for: they fed the system all sorts of misogynistic, racist 
remarks and Tay simply learned – “repeating these sentiments back to users, proving 
correct that old programming adage: flaming garbage pile in, flaming garbage pile out” 
[8]. The Tay example shows how an intelligent system may channel or enable 
unwanted, unpredicted behaviors because of at least three aspects that may have been 
underestimated during the design process: 
• people can be unpredictable, 
• unpredictability implies a potential for unpredictable outcomes, and 
• unpredictable outcomes may damage initial design intentions as well as the context 
surrounding the system. 
Stanford researchers recently made public work that utilized Deep Neural Networks 
to detect sexual orientation from facial images. While in their paper Wang and Kosinski 
Ten Guidelines for Intelligent Systems Futures 789
[9] explained their rationale, mentioning that “?ndings expose a threat to the privacy 
and safety of gay men and women”, their work turned against them, infuriating LGBT 
advocacy groups [10], attracting AI experts’ criticism and disapproving readers’ email 
threats, and becoming the center of an ethical review by the American Psychological 
Association (cleared, see [11]). In this example, while scientists appeared to have a 
benevolent agenda, their work negatively impacted the very cohort they allegedly 
intended to protect – as well as themselves. Again, it is clear that an intelligent system 
may end up channeling or enabling unwanted, unpredicted behaviors, likely because of 
key aspects that were underestimated during the design process. 
AI and Intelligent Systems are in desperate need for ethical as well as design 
guidelines. While AI has greatly evolved from a technical point of view, it is in its 
infancy as far as ethics and design process goes. The challenge is not only a technical 
one, it is ?rst and foremost a social, cultural, political, ethical one. Jake Metcalf 
articulates this issues when he states that “more social scientists are using AI intending 
to solve society’s ills, but they don’t have clear ethical guidelines to prevent them from 
accidentally harming people (…). There aren’t consistent standards or transparent 
review practices” [12]. 
The AI ethics debate is palpable yet not novel, given the number of organizations 
focused on the topic (e.g. Partnership on AI; Leverhulme Centre for the Future of 
Intelligence; Data & Society) and publications [5, 13–15]. It is clear that there are a 
number of unaddressed social, behavioural, decisional and moral questions and that 
great responsibilities are on the shoulders of those in charge of designing and devel-oping 
intelligent systems. As Bostrom [14] puts it, while we could build a “superin-telligence 
that would protect human values”, the “problem of how to control what the 
superintelligence would do” looks rather dif?cult – within this context, designers and 
technologists have key roles, agencies and responsibilities. 
Ackerman [16] proposed that when AI lets us down it is not due to its creators’ lack 
of care, it is due to the social-technical gap that exist between “what we know we must 
support and what we can support technically”. Agreeing with Ackerman’s [16] view, 
this paper additionally proposes that a future enriched and enabled by intelligent yet 
trustworthy, ethical systems requires careful implementation of guidelines that govern 
the actions of designers, technologists, social scientists, and business experts that 
decide what to design, how, why and what data to feed into a given system. The study 
here reported was motivated by a need to contribute to conversations on such 
guidelines. 
The ethics debate surrounding intelligent systems has been and still is dominated by 
two forces: data science on one side, social science and humanities on the other: on 
both sides sit experts. Barocas and Boyd [15] well discuss such a polarization, adding 
that “the gaps between data scientists and critics are wide, but critique divorced from 
practice only increases them”. Adding to Barocas and Boyd [15] perspectives, in this 
paper it is proposed that “practice” should be extended to include end users’ everyday 
life expertise. In other words, end users should actively participate in this debate and 
related decision making. 
This perspective is at the core of why the study described in this paper focused on 
identifying design guidelines that are inspired from, supported by and grounded in 
everyday people’s perspective, attitudes, thresholds and expectations toward intelligent 
790 D. Loi
systems. Derived from people’s everyday practices, the guidelines focus on empow-ering 
designers and developers to shape human-centric AI futures. The study did not 
aim at creating the ultimate guideline list – rather, it was conducted to identify practical 
people-centric recommendations that will hopefully spark a healthy debate on the 
processes used to develop intelligent systems and the agency that designers and 
developers should have in such processes. 
3 Study Approach 
The study at the center of this paper adopted a multi-pronged approach that mixed four 
very diverse techniques: a large scale market analysis; a multi-country survey; 18 in-home 
qualitative interviews; and one participatory workshop with 8 participants. 
The market analysis, conducted at the start of the project, focused on intelligent 
systems from a landscape perspective, with emphasis on nine verticals (home, of?ce, 
factory, retail, entertainment, public transport, automotive, classroom, learning) and 
nine vectors (players, products, academic research, investments, partnerships, associ-ations, 
mergers and acquisitions, policies, events). The analysis focused on existing 
secondary research (e.g. publicly available data such as academic publications, press 
releases, reports, whitepapers, and databases) to ground protocols for subsequent study 
phases and help isolate key focus verticals. At the end of this ?rst phase, smart home, 
autonomous vehicles and smart workspace were selected as key verticals to focus on in 
subsequent phases. 
Survey and in-home interviews focused from a quantitative as well as qualitative 
perspective on two key areas: people’s perceptions, attitudes, thresholds and expec-tations 
of intelligent systems; and people’s perspectives toward speci?c scenarios of 
intelligent systems in home, autonomous cars, and workspace. A series of jargon-free 
descriptions were created and used with participants to: 
• explain what intelligent systems are and what technologies they include; 
• provide a series of scenarios of what such systems may enable; 
• describe what smart homes, autonomous vehicles and smart workspaces are; 
• offer speci?c examples showing what smart homes, autonomous vehicles and smart 
workspaces may enable. 
In addition to gathering feedback on a wide range of themes, a series of metrics 
were collected to facilitate comparative analysis for each scenario and description: 
• 1 to 5 Likert ratings to identify comfort levels or assess concepts across seven 
parameters (relevance, uniqueness, appeal, quality, comfort, excitement, 
trustworthiness); 
• Word associations exercises, where participants were asked to provide feedback to 
concepts by selecting three items from a list of adjectives (e.g. exciting; creepy); 
• Emotion association exercises, where participants provided feedback by selecting 
three items from a list of emotions (e.g. love/desire; worried/fearful). 
While in-home interviews were conducted in the US, the survey was conducted in 
US, PRC (People’s Republic of China) and Germany. Participants for both survey and 
Ten Guidelines for Intelligent Systems Futures 791
interviews were recruited using a screener that focused on several criteria, including: 
age; gender; smartphone, PC and intelligent systems ownership; and intelligent systems 
purchase intention (refer to Fig. 1 for sample details). The screener also focused on soft 
quotas, such as family composition and income, and had a natural fallout in relation to 
users’ knowledge of intelligent systems. 
The survey, administered to 607 participants, focused on: 
• Ownership and intent to purchase intelligent systems; 
• Comfort levels with embracing intelligent systems in four diverse contexts (home, 
car, workspace and classroom); 
• Grouping intelligent systems’ scenarios into one of four clusters: must have, nice to 
have, do not want, and not sure; 
• High level feedback to smart home, autonomous cars and smart workspace; 
• Comfort level with speci?c usages focused on smart home, autonomous vehicles 
and smart workspace; and 
• Comparative feedback to smart home, autonomous cars and smart workspace 
concepts. 
In-home interviews lasted a total of two hours per participant, during which 
observational techniques were mixed with a semi-scripted interview approach that 
mirrored the above-mentioned survey’s flow, focus and criteria. After completing 
survey and in-home interviews, a subset of interviewees was invited to a participatory 
workshop where themes were further-explored and participants co-created a manifesto 
Fig. 1. Participants sample details. Source: Loi, D. 2017 
792 D. Loi
to regulate intelligent systems futures. It should be noted that a similar workshop 
structure was subsequently used within a professional conference setting [17, 18]. 
4 Results Highlights 
Given the sample size and the multiple approaches used in the study, a vast amount of 
data was collected and analysed. Not all data will be reviewed in this paper – while data 
that grounded the guidelines is discussed in the following sub-sessions, additional 
?ndings will be discussed in future publications. 
4.1 What One Knows Makes a Difference 
During survey and interviews, participants were asked to rate their likelihood to pur-chase 
an intelligent system for smart home, autonomous vehicle and smart workspace 
twice: at the start of the study and at the end, once participants had the opportunity to 
enrich their knowledge through provided documentation. 
Overall, participants’ likelihood to consider a smart home was rated higher when 
compared with ratings for autonomous vehicles and smart workspaces and, during 
interviews, participants were clearly more excited about this context of use. However, 
when comparing ratings collected at the start with those collected at the end of the 
session, data shows a drop in US and Germany ratings and an increase in PRC ones 
(Fig. 2). Moreover, when comparing pre- and post-ratings by gender, data shows a 
frequent drop in women’s ratings, while male ratings stay the same or increase. 
A similar trend was noted during in-home interviews. This data seems to indicate that 
the notion that knowledge equals understanding, and that understanding may equal 
higher likelihood to embrace a new concept may not always apply to intelligent 
systems. 
It is proposed that amount and type of provided information play a role in people’s 
perception and willingness to embrace intelligent systems. Culture and gender play an 
even more crucial role. This data highlights that how intelligent systems are explained 
Fig. 2. Response to concepts before and after exposure to details. Source: Loi, D. 2017 
Ten Guidelines for Intelligent Systems Futures 793
as well as demonstrated to consumers will be central to their willingness to embrace, or 
reject, such systems. 
4.2 Once They Have One, They Want More 
An interesting trend, here called the domino effect of smart things, emerged during in-home 
interviews. This will be illustrated with the story of Catherine (pseudonym), a 40 
year old woman that shares a three-story detached house with her husband and 9 year 
old daughter (Fig. 3). 
About a year ago Catherine purchased an Amazon Echo Dot and put it in the living 
room. After using the device for “music and reminders”, she starts enjoying more 
features yet realizes that the device’s range is con?ned to one room only. She therefore 
purchases a second system for the TV room and, soon after that, Catherine purchases a 
new system for her daughter’s bedroom, primarily to “listen to calming music at night”. 
Her daughter is on the autism spectrum and Catherine shares how pleased she is with 
the independence these devices are providing to her daughter. A fourth Dot is soon 
acquired for the home of?ce. Then, in June 2017, Amazon Echo Show start shipping – 
Catherine learns that she can buy two systems for a reduced cost and does not hesitate: 
one systems is purchased for the kitchen and the other for the main bedroom. She loves 
the ability to use the two new systems as a video intercom as they encourage her 
daughter to be more independent, while providing the ability to visually check on her 
as needed. Catherine proudly shows me that she can use her Dots and Shows to operate 
her new smart alarm system as well as the new smart light systems. She also explains 
that she may soon purchase a smart lock for her main door. While showing how to 
inter-operate her devices, she shares that she wishes they did a better job of under-standing 
when to listen (and not listen) or when she is talking to one versus another 
system. “They are not perfect”, she says. 
Fig. 3. Catherine discusses the bene?ts of her smart home systems. Source: Loi, D. 2017 
794 D. Loi
Catherine’s story was not unique during this study and survey data seems to 
indicate that this domino effect may be common. For instance, when comparing data 
related to ownership versus likelihood to purchase new intelligent systems, numbers 
show that intent is almost invariably higher than existing ownership. Not only, PRC 
participants (who owned the highest average of devices/person) expressed an higher 
intent to purchase than US and Germany counterparts. There seems to be a direct 
correlation between amount of owned intelligent systems and willingness to get even 
more. While exciting news for those that manufacture and sell such products, this trend 
could easily back?re if such multitudes of systems fail in satisfying people’s need to 
have consistency and reliability in how they relate to each other. 
4.3 Everyone Is Scared, yet Everyone Is Prepared to Compromise 
Another clear trend identi?ed through survey and then deepened through in-home 
interviews relates to participants’ general fears and preoccupations with being part of 
an arti?cially intelligent world. Sonia, a 66 year old retiree that spends time between 
grandkids and learning about technological innovations, shared a sense of resignation 
and acceptance when she stated: “I am not sure I want to live in a world where 
everything is arti?cial and intelligent, even if I can see a place for these things”. More 
combatively and critically, 33 years old small business owner Nathan shared concerns 
with technology with the potential to impact relationships, “affecting intimacy, creating 
dependency”. Most interviewees referred to intelligent systems as something useful yet 
deeply problematic. 
Many well understood the quid pro quo of this technology: to be smart, an intel-ligent 
system needs to learn and to learn, data must be fed to the system – personal data. 
At the same time, all participants seemed open for negotiations, prepared to accept and 
compromise, as long as a clear Return on Investment (ROI) is provided in return. 
In some cases accessing intelligence was worth the inconvenience of everyday 
intrusions: “You say ‘Alexa’ or ‘Echo’ and it wants to start talking and you do not even 
know how or why […] It feels like somebody is in our world [laughs]. We got used to 
it but it is one of the things we dislike” (Catherine, 40). In other cases, people liked the 
convenience but wanted to ensure they could still maintain the ability to be human: “In 
life it’s good to make mistakes so you can learn from them… here feels like you would 
not make mistakes anymore […] I can see the convenience but I can see that since 
everyone learns by doing, here I would not get a chance to” (Sheila, 69). Some were 
painfully aware of the fact that compromises will be needed: “I do not mind if my info 
is being shared but it is not good when the data can be used against you… it all goes 
down to what you are willing to compromise” [Stuart, 52]. Others made it clear that 
intelligent systems will need to provide a range of options, empowering them to choose 
based on personal comfort zone: “Camera is a bit too much for me […] but I know it’s 
needed for lots of these things so I guess it depends on the privacy options you have – 
provided you know where the data is and what is being used for” (Amanda, 28). 
As previously stated, if clear ROI is provided, people appear rather open to 
negotiate, accept and compromise access to their data. The key is to provide usages that 
have high ROI – these are discussed in the next section. 
Ten Guidelines for Intelligent Systems Futures 795
4.4 Safe, Ef?cient, Practical, Transparent 
When survey participants that declared not having an interest in purchasing AI-based 
systems (N = 70) were asked to provide the rationale for their adversity (see Fig. 4), 
39% of them listed security concerns, right after their top motivator: cost. At the third 
place they listed privacy and intrusiveness issues. The trend mirrored ?ndings from in-home 
interviews: “It’s always listening… How secure is it? […] what do they do to 
protect you?” (Jules, 39). Additionally, many participants had speci?c expectations: 
“My concern is that these things can be hacked so I expect them to be designed so they 
are safe” (Sonia, 66). People explained how their trust in a system is interlinked with 
their trust toward those providing it: “What if a business or social change occurs and 
the initial agreement behind the system changes? Where does (data) end? […] this is a 
power that can be abused and I know that people always abuse power. It’s not that I do 
not trust technology, I do not trust people” (Nathan, 33). Moreover, participants often 
referred to brand trust as something that would make or break their willingness to 
consider an intelligent system: “It’d have to be a company I trust, that has a proven 
record of keeping things private, no security breaches, scams and things like that” 
(Amanda, 28). 
Possibly due to these privacy and data security concerns, many expressed greater 
openness to and interest in intelligent systems focused on making them ef?cient. This is 
well demonstrated by survey responses to a question where participants were asked to 
group provided intelligent systems usages according to four clusters, namely must 
have, nice to have, do not want, and not sure. As illustrated in Fig. 5 (usages abbre-viated) 
the utilitarian usage “remind me of tasks and meetings” was rated as number 
one “must have” overall and number one for US (62%) and Germany (66%). 
Fig. 4. Top motivators for not purchasing an AI-based system. systems. Source: Loi, D. 2017 
796 D. Loi
Another indication of this ef?ciency trend is visible from survey feedback received 
in relation to Smart Home usages, where top ranking usages show a clear preference for 
usages focused on maintenance, prevention, and ef?ciency (Fig. 6, usages 
abbreviated). 
4.5 The Secret Life of Emotions 
The study also tested a number of Affective Computing [1] usages, all focused on the 
ability to identify the emotional state of a person or group of people to activate a series 
of context-appropriate actions (e.g. personalized recommendations or interventions). 
Fig. 5. Top 10 Must Have usages. Includes Top 3 ranking by country. Source: Loi, D. 2017 
Fig. 6. High Comfort Smart Home usages (abbreviated). % indicates amount of participants that 
feel “very comfortable” with a Smart Home performing the activity. Source: Loi, D. 2017 
Ten Guidelines for Intelligent Systems Futures 797
When faced with usages that relate to such an intimate topic, people often paused and 
their responses included a deep sense of skepticism, aversion, curiosity, and distrust. 
Some participants felt intrigued yet did not trust the ability of an affective system to 
be reliable and smart enough: “Thought provoking […] In theory is great but it’d need 
to be super sophisticated. Not sure it can be THAT sophisticated [emphasis added to 
mirror participant’s vocalization]” (Jim 65) and “the issue is not with discomfort with 
the action but doubts that it can do it properly and reliably (Sheila, 69). 
Others felt such usages would be intrusive: “Having a device monitoring my mood 
is too personal […] This is beyond what a machine should be doing. Keeping track of 
things is ok, but emotional state? I do not see this as a positive thing at all” (Esther, 34). 
Some participants did not oppose affective systems but opposed to the idea of a 
system with a conversational human agency: “I do not want the car to check up on me 
but I do not think this [idea] is a bad thing” (Catherine, 40), while only a minority of 
interviewees expressed excitement about the notion of human systems: “The more 
personal technology gets… that’d be great. Not only smarter but actually human-like” 
(Amanda, 28). 
It should be ?nally added that during in-home interviews people often offered ideas 
on how an affective system may speci?cally bene?t them in given situations, 
demonstrating the contextual nature of their willingness to embrace them. 
4.6 Smart Versus Intelligently Independent Systems 
Feedback demonstrated the clear line that people draw between smart versus inde-pendent 
systems. Many interviews clari?ed that an intelligent system to them means 
convenience and that, although open to some serendipity, they need control, pre-dictability 
and consistency. This need to be and feel in control is exempli?ed by data 
related to the usage “ask before automating things”, which ranked second, with 57% of 
users (aggregate numbers) selecting it as a must have feature (Fig. 5). People’s dis-comfort 
toward independent systems (and their need to keep control) was often 
interlinked with discomfort toward technology with its own personality and perspec-tives, 
as such traits were often seen as yet another way for a system to become 
independent, overstepping beyond acceptable smartness. 
It should be noted that while personality and perspectives were generally poorly 
received, participants saw speci?c contexts where they would be not only acceptable, 
but desirable. For instance, the “provide companionship to elderly or people in need” 
usage was very well received, ranking in the top ?ve must have features for USA and 
Germany and ?rst for PRC (Fig. 5). This indicates that in speci?c application-contexts 
(such as companionship) traits such as personality and the ability of having a per-spective, 
acting with some degree of independence, are acceptable if not desirable. It 
appears that the ROI of companionship usages is high, since people showed openness 
to compromise on system traits that would be otherwise considered undesirable. 
798 D. Loi
5 Ten Design Guidelines for Intelligent Systems Futures 
The previous section focused on insights gathered over the course of the study at the 
center of this paper. This section showcases 10 design guidelines that were directly 
informed by those insights. 
5.1 ONE: Take a Firm, Unambiguous Ethic Stand – Be a Trusted Brand 
Intelligent systems have been and will continue to be exposed to high scrutiny, and 
rightly so. In fact, scrutiny will increase, thanks to increased mainstream awareness, 
standards, and governmental mandates. Intelligent System designers and developers 
must not only promote ethical practices, they must design them: be ?rm in their ethical 
stand, ensuring that such a stand is present throughout their design and development 
process. They must be a trusted brand and design trusted systems – this includes a 
responsibility to speak up and not commit to (nor enable) new ideas, designs and 
developments if they appear to break one’s ethic stands. 
5.2 TWO: Adopt the Minimize Intrusion Mantra and a Less-Is-More 
Approach 
In case of doubt, intelligent system designers and developers should use minimalism as 
a compass. This means ensuring that intelligent systems strictly collect the minimal 
data (type and amount) that is required for successfully achieving a requested trans-action. 
During the study here reported, it was clear that the more one becomes familiar 
with an intelligent system, the more one trusts and feels comfortable in using it. 
However, familiarity requires not only time, it requires careful design and 
consideration. 
A way to use a minimalist approach is to set a system’s default settings at a basic 
level – basic functions, mirrored by basic amount of data collection, use, storage and 
exposure. An intelligent system should be capable of dynamically changing its settings, 
based on direct users’ requests or feedback loops embedded in the system. Such a 
system should be conceived as an organism that adapts to the user and should never 
collect more data than what is required to satisfactorily complete a task, unless spec-i?ed 
otherwise by its users. 
5.3 THREE: Design Socially Trusted and Trustworthy Platforms 
This guideline incorporates a number of sub-guidelines, all centred on ensuring 
intelligent systems are designed to be socially trusted and trustworthy. These include: 
• Intelligent systems must fail safe; 
• Privacy and hacking concerns must addressed upfront – for instance, by offering 
data protection services and warranties as part of the product; 
• Check and balances mechanism must be embedded into the fabric of a system; 
• By default, all data should be encrypted; 
Ten Guidelines for Intelligent Systems Futures 799
• Data types should be separated – only the user’s system should have the ability to 
assemble them into a cohesive picture; 
• Similarly to online money transaction models, how an intelligent system does 
something should be separated from what is being done and where it is done; 
• Intelligent systems’ motivations and actions must be transparent; 
• Users must have the ability to provide feedback to an intelligent system and the 
system must take all feedback into account as well as explain how and when 
provided feedback will be executed; and 
• An intelligent system must explain where data is stored, where it may go, who can 
access it (and why), and whether it will stay somewhere permanently – in acces-sible, 
transparent ways. 
5.4 FOUR: Do not Make Systems Human, but Capable of Helping 
Humans 
During the study at the center of this paper, interviewees clearly articulated what type 
of relationship they wish to have with intelligent systems: one where the system is in a 
subordinate role, never a peer. People expressed a need to be in charge and, in 
respecting such a need, the system should be designed to ensure that there is no 
ambiguity on who is in control. Unless otherwise speci?ed and authorized by the user, 
an intelligent system should always ask before acting, with the exception of emer-gencies, 
where additional behavioural rules will be needed and agreed on. Since 
human-like attributes are typically associated with an unwelcome level of indepen-dence, 
it is recommended to: 
• Design helper systems, with clear power boundaries; 
• Avoid designing systems that behave (or are perceived) as assuming or arrogant – 
this is a particularly important point for affective systems; 
• If emotion recognition is an available capability, tackle emotions by context and 
embed in the system ways to educate people about them. However, never assume 
on behalf of a user and always leave full control on actions and behaviors to end 
users; 
• Do not underestimate people’s scepticism on affective usages’ reliability; and 
• Consider using emotion understanding to help people help and connect with other 
people. 
5.5 FIVE: Prioritize Usages that Matter – Helper Usages 
The fact that a technology could do something does not imply that it should. Designers 
and developers should be mindful and reflective of this precept: in the case of intel-ligent 
systems this is an extremely important point to consider. Pushing usages with 
low (or no perceived) ROI or usages that may be (or be perceived as) ethically 
questionable or low in purpose will have long term repercussions on the product’s 
success, users’ willingness to embrace it, and potentially society. 
People during this study clearly expressed what usages have high ROI: utilitarian 
usages that make them feel ef?cient yet in charge; usages that make them save money, 
800 D. Loi
energy and time; usages that reduce their preoccupations and remove frustrations; and 
usages that allow them to remove boundaries and focus on what really matters. When 
designing intelligent systems, it is recommended that everyday chores, ef?ciency, and 
helper usages that keep a clear hierarchical distinction between helper and master are 
tackled ?rst. Systems should then be capable of identifying what type of advanced 
usages may be pertinent and of interest to end users and should be designed to educate 
as well as ramp up users to such advanced opportunities. 
5.6 SIX: Design Systems with Consistent Behaviors, Yet Design 
for Serendipity 
People expressed a duality throughout the study: on one side, they asked for consis-tency 
and reliability, on the other they did not want to feel predictable and asked for 
technology that can enrich their understandings and even surprise them (within comfort 
zone). It is recommended to design systems that are enriching and predictable yet 
designers and developers should embrace the challenge of designing for contextual, 
personalized serendipity. 
5.7 SEVEN: Make People Feel Unique and Empower Their Unique 
Goals 
Part of human nature is the need to feel special, acknowledged as an individual. During 
many interviews people described how they want to feel throughout their technological 
interactions: unique. Many for instance resented the notion of a system that is so smart 
that can predict their behavior spotlessly, as feeling predictable makes them feel boring 
and less unique. Additionally, most expressed a need for technology capable of 
empowering them so they can achieve their unique goals – especially those goals that 
would be out of their reach otherwise. 
Intelligent Systems should make people feel connected, wanted, and acknowledged 
– a system should make users feel that they are cared for and it should have ability to 
help users care for others and their surroundings. Companionship, social connected-ness, 
and mediated social interactions all offer great design and development oppor-tunities 
for addressing such a human need and for enriching people’s everyday lives, 
especially the lives of those that may be in higher need for assistance, support, and 
nurturing (for instance, senior citizens). 
5.8 EIGHT: Create Multiple and Diverse Educating Tools 
One size ?ts all approaches are rarely satisfactory – in the case of intelligent systems 
design, they would heavily compromise how people understand, perceive, relate to, and 
embrace such systems. During the study many asked for and expressed a strong sen-timent 
that they have the right to get a clear idea of what a system is, what it does, 
where data goes as well as who does what and when. 
Designers and developers should focus on empowering people so they can make 
informed choices on whether and how to incorporate intelligent systems in their lives. 
This requires not only an appreciation that people have diverse baseline understandings 
Ten Guidelines for Intelligent Systems Futures 801
on these matters: it requires the development of multiple, diverse, contextual ways 
(content, methods, tools) to educate people on what they are in for and how to choose 
what is best for them, their surroundings and communities. These educating tool must 
avoid cryptic, tech-centric and confusing lingo – people need to understand these 
systems, not be confused or feel betrayed by it. While this guideline should apply to 
any product, in the case of complex systems powered by AI, where trust is a massive 
sticky point, it becomes fundamental. 
5.9 NINE: Design On-Boarding Mechanisms that Grow and Evolve 
Borrowed from human resources [19], onboarding is a term that user experience 
designers adopted to describe the process used to ramp up users, making them familiar 
with a new site, app, or service and increasing the likelihood that they will continue 
using such a site, app, or service. When well implemented, onboarding assists users in 
learning how to use an application incrementally, avoiding cognitive overload. In the 
context of intelligent systems, it is suggested to use this technique not only to ramp up 
users as they start engaging with a system, but also to: 
• Adjust the system’s behavior as it increases its knowledge of the user and as a 
user’s understanding of, and trust toward, the system changes. For instance, a 
system may ask the user: I noticed you tend to do X every day at the same time, do 
you wish me to do Y instead of Z in the future?; and 
• Contextually explain to the user what will be compromised or what could be gained 
if a setting was changed or activated in relation to a system’s recommendation. For 
instance, the system may say: Given you seem to like X, I think you may also enjoy 
Y. If you wish to try Y out, note that I will have to gather Z data under W 
circumstances. 
Intelligent systems should be able to evolve as their relationship with their users 
evolve, they should have ability to grow up and grow old with their owners, and should 
transparently empower people, equipping them to choose what is best for them and to 
change their choices over time. 
5.10 TEN: Create Families of Products 
During interviews with owners of several smart devices, a need for smart inter-operability 
across devices often emerged as a key theme. Additionally, during the 
research participants showed a general tendency to use (or want to use) technology 
outside its initially designed purposes – tendency that reached new levels when they 
interacted with their multiple intelligent systems: they expected such systems to be 
smarter, able to dynamically adjust to their everyday practices, and capable of perfectly 
collaborating and understanding each other, regardless of who manufactured them and 
regardless of their original purpose. In light of such behaviors, designers and devel-opers 
should adopt a family-of-products design mindset, carefully designing for users’ 
expectations, potential cross-devices usages and likely mis-usages. Each intelligent 
system should be designed as a node in a complex, dynamic network of systems. 
802 D. Loi
6 Conclusions and Implications 
The study reported in this paper aimed at identifying design guidelines that are inspired 
from, supported by and grounded in everyday people’s perspective, attitudes, thresh-olds 
and expectations toward intelligent systems. Thanks to a multi-pronged approach 
that included qualitative and quantitative tools, a number of key insights were dis-cussed. 
First, the paper discussed how people’s knowledge of intelligent systems 
impacts their understandings of (and willingness to embrace) such systems. After an 
overview of the domino effect of smart things, the paper articulated that while people 
have great concerns, they are prepared to flex their comfort zones if there is an evident 
ROI. Then, it was demonstrated that people want to maintain control over intelligent 
systems and that they have a preference for ef?ciency, helper usages. Finally, insights 
on how people view Affective Computing [1] were offered, alongside a discussion 
showing that while people are open to smart things, they are less enthusiastic toward 
intelligent independent ones. These insights were then used to articulate ten design 
guidelines, namely: 
• Take a ?rm, unambiguous ethic stand – be a trusted brand 
• Adopt the minimize intrusion mantra and a less-is-more approach 
• Design socially trusted & trustworthy platforms 
• Do not make systems human, but capable of helping humans 
• Prioritize usages that matter – helper usages 
• Design systems with consistent behaviors, yet design for serendipity 
• Make people feel unique and empower their unique goals 
• Create multiple and diverse educating tools 
• Design on-boarding mechanisms that grow and evolve 
• Create families of products 
The study here discussed was not intended to produce the ultimate design guide-lines 
– rather, it was conducted to identify practical people-centric recommendations 
that will hopefully spark a healthy debate on the processes used to develop intelligent 
systems and the agency that designers and developers have and should have in such 
processes. Within such a debate, a number of questions remain in need for deepening 
and practical development, including: 
• What ethical considerations should designers and developers prioritize? 
• What level of autonomy and agency should intelligent systems have? 
• Should autonomy and agency change contextually or by context of use? How? 
• What level of transparency should be provided to end users? How? 
• How should an intelligent system relate to, converse and engage with users? 
• What speci?c design attributes may enable systems that are effective and accurate 
yet unobtrusive, respectful, intuitive and transparent intelligent? 
• Can a human-centric approach to intelligent systems be effective while enabling 
sustainable business models and technological progress? 
• What social and behavioral contracts should underpin people’s interactions with 
intelligent systems? 
Ten Guidelines for Intelligent Systems Futures 803
Designers and developers have the moral and ethical responsibility to engage with 
how intelligent systems futures are being and will be shaped. A future enriched and 
enabled by intelligent yet trustworthy, ethical systems requires careful implementation 
of guidelines that govern the actions of those in charge of deciding what to design, 
how, why and what data to feed into a given system. Designers and developers are 
called on to be challenged by and contribute to the complex yet exciting task of shaping 
the present and future of intelligent systems. 
References 
1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 
2. Global Arti?cial Intelligence study: sizing the prize. PwC. https://www.pwc.com/gx/en/ 
issues/data-and-analytics/publications/arti?cial-intelligence-study.html. Accessed 11 Oct 
2017 
3. The age of AI surveillance is here, Quartz. https://qz.com/1060606/the-age-of-ai-surveillance-
is-here/. Accessed 30 Sep 2017 
4. Are we about to witness the most unequal societies in history? The Guardian. https://www. 
theguardian.com/inequality/2017/may/24/are-we-about-to-witness-the-most-unequal-societies-
in-history-yuval-noah-harari. Accessed 23 June 2017 
5. Bostrom, N., Yudkowsky, E.: The ethics of arti?cial intelligence. In: The Cambridge 
Handbook of Arti?cial Intelligence, pp. 316–334. Cambridge University Press (2011) 
6. Google’s New AI is better at creating AI than the company’s engineers, futurism. https:// 
futurism.com/googles-new-ai-is-better-at-creating-ai-than-the-companys-engineers/. Acces-sed 
23 June 2016 
7. Trolls turned Tay, Microsoft’s fun millennial AI bot, into a genocidal maniac, The 
Washington Post. https://www.washingtonpost.com/news/the-intersect/wp/2016/03/24/the-internet-
turned-tay-microsofts-fun-millennial-ai-bot-into-a-genocidal-maniac/?utm_term=. 
388462f65470. Accessed 23 June 2016 
8. Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day. The Verge. 
https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist. Accessed 24 
Mar 2016 
9. Wang, Y., Kosinski, M.: Deep neural networks are more accurate than humans at detecting 
sexual orientation from facial images. J. Pers. Soc. Psychol. 114, 246 (2018) 
10. GLAAD and HRC call on Stanford University & responsible media to debunk dangerous & 
flawed report claiming to identify LGBTQ people through facial recognition technology. 
https://www.glaad.org/blog/glaad-and-hrc-call-stanford-university-responsible-media-debunk-
dangerous-?awed-report. Accessed 7 Nov 2017 
11. Study claiming AI can detect sexual orientation cleared for publication. KQED. https://ww2. 
kqed.org/futureofyou/2017/09/13/can-facial-recognition-detect-sexual-orientation-controversial-
stanford-study-now-under-ethical-review/. Accessed 7 Nov 2017 
12. AI research is in desperate need of an ethical watchdog, Wired. https://www.wired.com/ 
story/ai-research-is-in-desperate-need-of-an-ethical-watchdog/. Accessed 14 Jan 2017 
13. Gunkel, D.J.: The Machine Question: Critical Perspectives on AI, Robots, and Ethics. MIT 
Press, Cambridge (2012) 
14. Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 
Dangers (2014) 
15. Barocas, S., Boyd, D.: Engaging the ethics of data science in practice. Commun. ACM 60 
(11), 23–25 (2017) 
804 D. Loi
16. Ackerman, M.S.: The intellectual challenge of CSCW: the gap between social requirements 
and technical feasibility. Hum. Comput. Interact. 15(2–3), 179–220 (2000) 
17. Loi, D., Raffa, G., Esme, A.A.: Design for affective intelligence. In: HCII 2017, San Antonio 
(2017) 
18. Loi, D., Lodato, T., Wolf, C.T., Arar, R., Blomberg, J.: PD manifesto for AI futures. In: PDC 
2018. Hasselt & Genk, Belgium (2018) 
19. Bauer, T.N., Erdogan, B.: Organizational socialization: the effective onboarding of new 
employees. In: Zedeck, S. (ed.) APA Handbook of Industrial and Organizational 
Psychology, vol. 3, pp. 51–64 (2011) 
Ten Guidelines for Intelligent Systems Futures 805
Towards Computing Technologies on Machine 
Parsing of English and Chinese Garden Path 
Sentences 
Jiali Du(&) , Pingfang Yu, and Chengqing Zong 
Guangdong University of Foreign Studies, Guangzhou 510420, China 
dujiali68@126.com 
Abstract. This paper discusses the syntactic effect and semantic influence of 
computing technologies on machine parsing and machine translation (MT) of 
English and Chinese Garden Path Sentences. An effective MT system focuses 
on both accuracy and speed. Both syntactic and semantic information exerts a 
considerable influence on translation. English gives head-occupied focus and 
syntactic information is a key for parsing. Chinese provides end-directed focus 
and semantic background is necessary for parsing. The translation of garden 
path sentences in English and Chinese shows distinctive features. Different 
?ller-gap relations in source and target languages result in different output. The 
integration of various methods of computational linguistics, e.g. CFG, RTN, 
CYK, WFST and CQ analysis is helpful to explain the processing breakdown 
and backtracking clearly and concisely. 
Keywords: Machine translationComputational linguistics 
Garden path sentences 
1 Introduction 
MT (machine translation) is the direct result of combining computational skill and 
linguistic knowledge. With the development of computational skills, MT is used to 
analyze the natural languages and is considered one of the ?rst computational appli-cations 
of linguistic knowledge from the 1950’s. Even though the early MT pro-grammes 
begin in with very high hopes, both accuracy and precision is lacking. MT 
attempts to bridge the language gap and integrate communication skills with machine 
and human being; however, the technology may not be always perfect. Some scholars 
call this unbalanced phenomenon: ‘the spirit is willing but the flesh is weak’ or ‘the 
steak is wonderful but the whisky is lousy’. With the advancement of software, 
equipment and linguistic involvement, the steadiness and reliability of MT has been 
greatly improved over the past decade. Designing a system for MT needs to capture 
both language-independent and language-speci?c information. An effective translation 
has to recognize the whole phrases, sentence structures and their closest counterparts in 
the target language. The application of the large computerized corpus and statistical 
techniques leads to better MT. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 806–827, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_60
MT system is now ubiquitous and potent. The today’s global marketplace increases 
the need for translation, and the growth of computing power satis?es the demand. The 
widespread accessibility of MT makes the lively presentation clear and concise. For 
example, the intelligent Thai text – sign translation for language learning (ITSTL) 
system may bene?t the deaf and hearing-impaired by translating Thai text into sign 
language [1]. A parameterized interlingual approach called UNITRAN is desirable 
because of its simple description of natural grammars. The MT system facilitates 
modi?cation, augmentation, and cross-linguistic variation [2]. A functional approach to 
special translation takes advantages of the speci?ed information of function and is 
considered an effective method for MT [3]. Only on the basis of hybrid knowledge 
databases of MT system can the suffering from garden path effect on sentence structure 
be alleviated and can robustness of MT system be enhanced [4–6]. 
Rapid growth of Chinese economy needs indistinguishably good translations. 
China belongs to BRICS group, which comprises Brazil, Russia, India, China and 
South Africa since the 2010. China and the other counties, which are at a similar stage 
of newly advanced economic development, try to build better global economy. The 
shift of economic power from the developed to the developing requires China strong 
enough to input ef?ciently and output widely. Thus more research on translation is 
needed. MT in China is more popular than ever with the key demand of translation, and 
the linguistic involvement in the system is probably the inevitable way of the further 
improvement. The translation of complex sentences, e.g. garden path (GP) sentences, is 
hard for system. It is necessary for system to understand the deep structure of GP 
sentences in order to work properly. English GP analysis has been becoming an 
attractive research programme since Bever [7, 8] published his articles to discuss the 
special phenomenon. However, Chinese GP discussion is infrequent and inadequate. 
This paper will discuss the translation of English and Chinese GP sentences, and 
compare the difference between them. 
2 Machine Translation and Linguistic Involvement 
MT is a sub-?eld of computational linguistics. Continued and heavy involvement in 
linguistic analysis is one of the key factors to enhance the quality of MT. Early MT 
system basically performs simple substitution of words in source language for words in 
target one, and pays less attention to linguistic knowledge. As a result, the complex 
cognitive operation related to in-depth knowledge of the grammar, semantics, syntax, 
idioms, and even the culture of speakers loses the power during the translation, which 
makes MT system unreliable and unrobust. Idiosyncratic gaps between source and 
target sentence structure usually originate in cultural differences and computational 
treatment of these gaps is a very dif?cult problem for not-yet-completed MT [9]. The 
syntactic involvement makes MT consider the phrases rather than the words to be the 
translation units. The application of linguistic and statistical methods in a Spanish-to-
Basque speech MT system improves the probabilistic ?nite-state transducers, and the 
syntactic role helps to cluster words by means of a statistic analysis [10]. Based on 
doctor–patient dialogues, e.g. Retry (Repeat and Rephrase) and Accept behaviors in the 
mediated verbal channel, an English–Persian speech MT system is established, and the 
Towards Computing Technologies on Machine Parsing 807
dynamic Bayesian network model is proved to be effective to guide the users’ beha-viour 
[11, 12]. In a cross-language information retrieval system, structured linguistic 
factors perform well, including morphological analyzers, word lists, electronic dic-tionaries, 
n-gramming of untranslatable words, unstructured queries, etc. [13]. 
Statistical skill pushes the development of MT. The human translation process 
comprises two parts, namely decoding the meaning of the source text and re-encoding 
this meaning in the target language. This process is an intellectual one. 
However, MT is a process different from human translation, and the complex cognitive 
procedures are impossible to be fully and clearly included in the system. The statistical 
methods and statistics-based corpus help the MT system to create a target text as a 
bilingual person nearly does and to understand the meaning as if it were almost con-veyed 
by human being. For example, a speech recognition system implemented by 
posterior probability is effective to improve speech translation, which is tested in a 
Japanese-to-English task; and the speech quality is improved by converting the 
recognition word lattice to a translation word graph [14]. By describing a self-organizing 
map (SOM), a SomAgent statistical analysis in a prototype of an automatic 
German–Spanish MT can use arti?cial neural networks to determine the correct 
meaning of a word and take advantages of the parallelism by modeling a community of 
conceptually autonomous agents [15]. Probabilistic inference stimulates panlingual 
lexical translation. By building a massive translation graph for languages with no 
translation dictionaries, scholars ?nd a novel approach to lexical translation with the 
automatic construction from over 630 machine-readable dictionaries. This method is 
proved to be effective and helpful for MT [16]. 
Both computational technologies and linguistic knowledge are necessary for MT. 
In static single assignment form (SSA form) which is a popular intermediate repre-sentation 
in compilers, back-translation algorithms is admissible and preferred [17]. 
Linguistic approaches, e.g. case grammar, may take effect. In the MT system from 
Arabic to English and French, Arabic lends itself quite naturally to a Fillmore-like 
analysis, according to which verbs are the center and the various noun sentences 
occupy speci?c peripheral nodes around the center [18]. In a user-assisted query MT 
system, Cross-Language Information Retrieval (CLIR) is interactive and helps searcher 
and system collaborate to ?nd appropriate documents with the effective use of the new 
capabilities [19]. The integration architectures are effective and ef?cient. Speech-input 
translation can be properly regarded as a pattern recognition problem. Both statistical 
alignment models and stochastic ?nite-state transducers are useful in construction of 
MT system from Spanish/Italian to English. The acoustic models (hidden Markov 
models) are embedded into the ?nite-state transducers, and the translation of a source 
utterance is the result of search on the integrated network [20]. 
The support and integration of rule-based module and statistical translation module 
is important as an ongoing and stable factor for MT system. For example, in the 
Spanish MT system from speech to sign language, the eSIGN 3D avatar animation 
module reduces the delay between the spoken utterance and the sign sequence ani-mation. 
And the con?guration with the integration of rule and statistics is helpful for 
system [21]. Among the influential factors involved in the advancement of MT, syn-tactic 
effect is noticeable and persistent. 
808 J. Du et al.
3 Syntactic and Semantic Effect on Machine Translation 
This section comprises the analyses of Raman and Reddy’s system, syntactic effect on 
English garden path sentences and semantic effect on Chinese sentences. 
3.1 Raman and Reddy’s Machine Translation System 
An effective MT system focuses on both accuracy and speed. English and Indian 
belong to the same language family, namely Indian-European family, and they share 
many linguistic similarities, which are convenient to develop a highly accurate Indian-
English Parallel MT System. Real time application is an effective method to speed the 
translation. The MT system must highlight parallelism both at word-level in the 
morphological analysis and at phrase-level in the semantic analysis stages [22]. 
According to Raman & Reddy, exploitation of parallelism and a good dictionary 
organization may bring the ef?ciency and effectiveness of system. The construction of 
system comprises the morphological analysis, phrase-level analysis and generation. It is 
a reasonable strategy for a proper organization of dictionary to allow exploitation of 
parallelism in its access mechanism. The system test is related to different conditions of 
load sharing by the transputers, and to sentences consisting of different phrases. 
Figure 1 shows the flow diagram of the parallel MT clearly and cleanly. 
The parallel system needs the information of lexical and phrasal levels. In Trans-puter 
1, ten procedures are discussed, for example, (1) Read the input sentence, 
(2) Split the sentence into words, (3) Control the distribution, (4) Morphological 
analysis, (5) Collect the analysis info, (6) Split the sentence into phrases, (7) Control 
the distribution, (8) Phrase-level analysis and generation, (9) Collect output structure, 
(10) Decode & display TL sentence. The arrow directs and follows the procedure from 
Step one to Step ten. This means system ?rstly processes the word-level information 
and then begins the processing of phrasal level. Transputer 1 is a fundamental stage in 
which the basic processing of natural language is required. Transputer 2 includes (1) 
Morphological analysis, (2) Router, (3) Phrase-level analysis & Generation, (4) Rou-ter. 
This stage is a transitional one which exists between Transputer 1 and Transputer 3. 
Transputer 3 is involved in Morphological analysis, and Phrase-level analysis and 
Generation. The stage is an advanced one and all the labeled information is processed 
here. The arrows from Transputer 1 to Transputer 3 are bidirectional and the translation 
is dynamic and flexible. 
The ef?ciency of Indian-English Parallel MT System shows linguistic knowledge 
and practice is necessary, and syntactic effect on MT is noticeable and striking. System 
has to adhere to the linguistic rules governing the suitability of both source and target 
languages. From the system created by Raman & Reddy, we can ?nd the fact that the 
same language family shares many similarities, and this peculiarity brings ef?ciency 
and effectiveness. If the source language and target language come from different 
families, e.g. Chinese from Sino-Tibetan family and English from Indian-European 
family, we are not sure whether the ef?ciency can be further improved or not. We will 
compare the difference of syntactic types between head-initial English and head-?nal 
Chinese, and analyze the deep structures of two languages based on GP sentences. 
Towards Computing Technologies on Machine Parsing 809
3.2 Syntactic Effect on Machine Translation of English GP Sentences 
An endocentric phrase has a special syntactic type. In linguistics, the phrase type 
comprises endocentric and exocentric phrases. The endocentric phrase includes head 
and dependents. The head determines the syntactic type of that phrase and the 
dependents modify the head as the complements. The head plays an active role in 
sentence processing because it can specify subcategorizing relations with elements 
within the same phrase and get integrated with other elements outside of the phrase. On 
the contrary, the exocentric has no head and dependents, and all the elements contribute 
to the syntactic type, lacking a clear head. The headed phrases decide on the direction 
of branching. Head-initial, head-?nal and head-medial phrases are the basic categories. 
Fig. 1. Raman & Reddy’s flow diagram of the parallel machine translation system. 
810 J. Du et al.
For head-initial phrases, dependents are placed after head, which emphasizes the left-branching. 
For head-?nal phrases, they are right-branching, and head-medial phrases 
consist of both left and right branching. 
There are two accepted division of thematic roles involving the head. One argues 
that thematic roles will be assigned to arguments once they come into presence in a 
sentence. The other maintains that thematic roles are not assigned until the head is 
reached. The early assignment meets the cognitive demands by processing the sentence 
as quickly as possible, which sometimes brings the GP effect because of the shift of 
thematic roles assigned to the heads. The late assignment needs more cognition since 
the brain has to maintain the incremental information, and multiple possibilities need to 
be maintained until the appearance of head. For the head-initial language, the early 
assignment is ef?cient, while the late assignment is effective for the head-?nal lan-guage. 
For example, if a special relative clause is temporarily analyzed as a main 
clause, the shift from a head-initial processing to a head-?nal processing may some-times 
bring processing breakdown of GP model [23, 24]. 
Chinese is head-?nal while English is head-initial. According to a head direc-tionality 
parameter in word order, many lingual typologists classify Chinese syntax as 
head-?nal since the central emphases are often found at the end of phrases, and relative 
clauses are put before their referents. English is considered to be the head-initial one, 
and the head is always prior to the dependents. The difference between head-?nal and 
head-initial typologies leads to the distinct head-dependents relations, namely ?ller-gap 
relationship. 
Chinese provides end-directed focus. Chinese is a typical head-?nal language, and 
the gap or dependent is followed by the head or ?ller. The sequence of processing is 
“relative clause – relativizer –?ller”. Both the relativizer and the ?ller are linearized to 
the right of the clauses. The head-?nal Chinese sample is as follows. 
“wuding/shang/zhire/yangguang/tuise/[__GAP] de /jimu[FILLER] ” [Chinese] 
“the house/on/shining/the sun/faded/[__GAP] relativizer /the building blocks 
[FILLER]” 
“the building blocks the sun shining on the house faded” [English] 
According to the Penn Treebank Set and Stanford Parser analysis, we can obtain the 
hierarchical structure of the Chinese sample. 
English gives head-occupied focus. English is mostly considered head-initial, and 
the ?ller appears ?rstly with the gap following. The processing is “?ller– relativizer – 
relative clause”. 
Towards Computing Technologies on Machine Parsing 811
(ROOT 
(NP 
(CP 
(IP 
(LCP 
(NP(NN wuding 
(LC shang)) 
(NP 
(VP(VA zhire) 
(NP(NN yangguang))) 
(VP(VA tuise))) 
(DEC de)) 
(NP(NN jimu)))) 
Dependency tree is a helpful diagram to clearly show the ?ller-gap relation. Please 
see Franz Kafka’s analysis (see Fig. 2). 
In Fig. 2, we can ?nd the ?ller appears ?rstly and GAPs are hierarchically placed as 
the dependents. This head-occupied structure is distinctly different from Chinese end-directed 
one. According to the analysis of Chinese sample above, we can obtain 
different ?ller-gap relations and different hierarchical structures. 
discovered 
he that 
had 
he been 
changed 
into 
bug 
amonstrous 
...he discovered that [...] he had been changed into a 
monstrous verminous bug 
verminous 
Fig. 2. Franz Kafka’s dependency tree. 
812 J. Du et al.
“the building blocks [FILLER] the sun shining on the house faded [__GAP]” 
(ROOT 
(FRAG 
(NP 
(NP(DT The) (NN building)) 
(NP(NNS blocks)))) 
(SBAR 
(S 
(NP 
(NP(DT the)(NN sun)) 
(VP(VBG rising) 
(PP(IN on) 
(NP(DT the)(NN house))))) 
(VP(VBD faded)))) 
Different ?ller-gap relations in source and target languages result in different out-put. 
Example 1 below is an English GP sentence which makes English readers suffer a 
lot of cognitive ups and downs. However, the Chinese translation of this sentence is a 
non-GP sentence, and there is no cognitive overburden for readers. The reason lies in 
the distinctly different ?ller-gap relationships. 
Example 1. The building blocks the sun shining on the house faded are red. (GP 
sentence). 
Example 2. Wuding shang zhire yangguang tuise de jimu shi hongse de. (Non-GP 
sentence). 
According to the Context-Free Grammar, the Example 1 can be processed suc-cessfully 
with the shift of blocks from verb to nouns. Please see the processing below. 
Input: The building blocks the sun shining on the house faded are red. 
G={Vn, Vt, S, P} 
Vn={S, NP, VP, IP, Det, N, SC, V, PP, P, Adj} 
Vt={the, building, blocks, sun, shining, on, house, faded, are, red} 
S=S 
P: 
(a) S-NP VP (b) NP-NP IP (c) NP-Det NP (d) NP-Det N 
(e) NP-N N (f) NP-NP SC (g) SC-V PP (h) PP-P NP 
(i) IP-NP VP (j) VP-V Adj (k) VP-V (l) VP-V NP 
(m) Det-{the} (n) N-(building, building blocks, sun, house} 
(o) V-{blocks, shining, faded, are} (p) Adj-{red} (q) Prep-{on} 
Processing procedures can be shown below. If the blocks are considered to be a 
verb, the processing breakdown will be created, and system will have to backtrack to 
the position where blocks can be optionally considered to be a plural noun rather than a 
verb. 
We can ?nd that system will return to the original place in which the building 
blocks are processed as a compound rather than a structure of N+V. 
Towards Computing Technologies on Machine Parsing 813
The building blocks ... on the house faded are red 
Det building blocks ... on the house faded are red m 
Det N blocks ... on the house faded are red n 
NP blocks ... on the house faded are red d 
NP V the sun ... on the house faded are red o 
NP V Det sun ... on the house faded are red m 
NP V Det N shining on the house faded are red n 
NP V NP shining on the house faded are red n d 
NP V NP V on the house faded are red o 
NP V NP V P the house faded are red q 
NP V NP V P Det house faded are red m 
NP V NP V P Det N faded are red n 
NP V NP V P NP faded are red d 
NP V NP V PP faded are red h 
NP V NP SC faded are red g 
NP V NP faded are red f 
NP VP faded are red l 
S faded are red a 
S V are red o 
S V V red o 
S V V Adj p 
S V VP j 
? 
Breakdown and Backtracking 
814 J. Du et al.
This option leads to a successful processing after the breakdown and backtracking. 
The building blocks ... on the house faded are red 
Det building blocks ... on the house faded are red m 
Det N blocks ... on the house faded are red n 
Det NN ... on the house faded are red n 
Det NP ... on the house faded are red e 
NP the sun shining. on the house faded are red c 
NP Det sun shining on the house faded are red m 
NP Det N shining on the house faded are red n 
NP NP shining on the house faded are red d 
NP NP V on the house faded are red o 
NP NP V P the house faded are red q 
NP NP V P Det house faded are red m 
NP NP V P Det N faded are red n 
NP NP V P NP faded are red d 
NP NP V PP faded are red h 
NP NP SC faded are red g 
NP NP faded are red f 
NP NP V are red o 
NP NP VP red k 
NP IP are red i 
NP are red b 
NP V red o 
NP V Adj p 
NP VP j 
S a 
SUCCESS 
The decoding above can be shown in a clear tree diagram in which the key word of 
blocks is considered to be a plural noun rather than a verb (Fig. 3). 
S 
NP VP 
NP IP V Adj 
Det NP NP VP 
N N NP SC V 
Det N V PP 
Prep NP 
Det N 
The building blocks the sun shining on the house faded are red 
Fig. 3. Tree diagram of Example 1. 
Towards Computing Technologies on Machine Parsing 815
Recursive Transition Network is another effective method to process Example 1 
clearly and concisely. According to the RTN in Fig. 4, the system comprises one main 
net and ?ve subnets. The different hierarchical frameworks make the processing 
ef?ciently. 
According to the NP subnet, both the building and the building blocks can be 
processed as a NP, and the ambiguity leads to the different directions of processing. 
S net: NP VP 
0 1 f 
N SC 
NP subnet: Det N IP 
0 1 2 f 
V 
VP subnet: V Adj 
0 1 f 
NP 
V-ing PP 
SC subnet: 0 1 f 
P NP 
PP subnet: 0 1 f 
NP VP 
IP subnet 0 1 f 
Fig. 4. RTN of Example 1. 
816 J. Du et al.
The building blocks the sun shining on the house faded are red 
<S/0, The building blocks the sun...faded are red, > 
<NP/0, The building blocks the sun...faded are red, S/1: > 
<NP/1, building blocks the sun...faded are red, S/1: > 
<NP/2, blocks the sun...faded are red, S/1: > 
<NP/f, blocks the sun...faded are red, S/1: > 
<VP/0, blocks the sun...faded are red, NP/f: S/f: > 
<VP/1, the sun...faded are red, NP/f: S/f: > 
<NP/0, the sun...faded are red, VP/f: NP/f: S/f: > 
<NP/1, sun shining...faded are red, VP/f: NP/f: S/f: > 
<NP/2, shining...faded are red, VP/f: NP/f: S/f: > 
<SC/0, shining...faded are red, NP/2: VP/f: NP/f: S/f: > 
<SC/1, on the house...red, NP/2: VP/f: NP/f: S/f: > 
<PP/0, on the house...red, SC/1: NP/2: VP/f: NP/f: S/f: > 
<PP/1, the house...red, SC/1: NP/2: VP/f: NP/f: S/f: > 
<NP/0, the...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: > 
<NP/1, house...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: > 
<NP/2, faded...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: > 
<NP/f, faded are red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: > 
<PP/f, faded are red, SC/1: NP/2: VP/f: NP/f: S/f: > 
<SC/f, faded are red, NP/2: VP/f: NP/f: S/f: > 
<NP/f, faded are red, VP/f: NP/f: S/f: > 
<VP/f, faded are red, NP/f: S/f: > 
<NP/f, faded are red, S/f: > 
<S/f, faded are red, > 
<, faded are red, > 
FAIL 
BREAKDOWN AND BACKTRACKING 
The processing above shows that the option of the building as a NP considers 
blocks to be a verb, and this requirement directs towards the processing breakdown and 
backtracking. If the building blocks are accepted by the system as a NP, a perfect result 
will appear. 
Towards Computing Technologies on Machine Parsing 817
<NP/1, building blocks the sun...faded are red, S/1: > 
<NP/1, blocks the sun...faded are red, S/1: > 
<NP/2, the sun...faded are red, S/1: > 
<IP/0, the sun...faded are red, NP/2: S/1: > 
<NP/0, the sun...faded are red, IP/1: NP/2: S/1: > 
<NP/1, sun...faded are red, IP/1: NP/2: S/1: > 
<NP/2, shining on the house...red, IP/1: NP/2: S/1: > 
<SC/0, shining on the...red, NP/2: IP/1: NP/2: S/1: > 
<SC/1, on the house...red, NP/2: IP/1: NP/2: S/1: > 
<PP/0, on the house...red, SC/1: NP/2: IP/1: NP/2: S/1: > 
<PP/1, the house...red, SC/1: NP/2: IP/1: NP/2: S/1: > 
<NP/0, the...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: > 
<NP/1, house...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: > 
<NP/2, faded...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: > 
<NP/f, faded...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: > 
<PP/f, faded are red, SC/1: NP/2: IP/1: NP/2: S/1: > 
<SC/f, faded are red, NP/2: IP/1: NP/2: S/1: > 
<NP/f, faded are red, IP/1: NP/2: S/1: > 
<VP/0, faded are red, IP/f: NP/2: S/1: > 
<VP/f, are red, IP/f: NP/2: S/1: > 
<IP/f, are red, NP/2: S/1: > 
<NP/f, are red, S/1: > 
<VP/0, are red, S/f: > 
<VP/1, red, S/f: > 
<VP/f, , S/f: > 
<S/f, , > 
<, , > 
SUCCESS 
The unsuccessful and successful processing shown above can be expressed in a 
well-formed substring table. The well-formed substring table is an n*n matrix in which 
n refers to the string of length. The ?eld (i, j) of the chart comprises the set of all 
elements which start at position i and ?nish at position j, namely 
Chart ði; jÞ ¼ AjA 
! 
Wi þ 1...Wj 
j j 
. 
The famous CYK algorithm can be used to process the GP sentence effectively. 
According to the CYK, if chart (2, 3), namely, blocks, is considered to be a verb, we 
can obtain a non-well-formed substring table in Fig. 5 in which the last three words fail 
to be included in the system. The chart and the n*n matrix based on the processing are 
shown below. 
In the unsuccessful processing matrix of Example 1, the last words V, V, A cannot 
be processed since system has completed the processing at the chart (0, 9) in Table 1 in 
which S has been obtained. 
Obviously, this processing is unacceptable. If Chart (2, 3) is processed as a noun 
and chart (0, 3) is analyzed as a NP, system can process perfectly in Fig. 6. 
818 J. Du et al.
Fig. 5. The non-well-formed substring table of Example 1. 
Table 1. The unsuccessful processing matrix of Example 1. 
1 2 3 4 5 6 7 8 9 10 11 12 
0 {D} {NP} {} {} {} {} {} {} {S} {} {} {?} 
1 {N} {} {} {} {} {} {} {} {} {} {} 
2 {V} {} {} {} {} {} {VP} {} {} {} 
3 {D} {NP} {} {} {} {NP} {} {} {} 
4 {N} {} {} {} {} {} {} {} 
5 {V} {} {} {SC} {} {} {} 
6 {P} {} {PP} {} {} {} 
7 {D} {NP} {} {} {} 
8 {N} {} {} {} 
9 {V} {} {?} 
10 {V} {VP} 
11 {A} 
Fig. 6. The well-formed substring table of Example 1. 
Towards Computing Technologies on Machine Parsing 819
.0The. 1building.2blocks. 3the. 4sun. 5shining. 6on. 7the. 8house. 9faded. 10are. 11red.12 
n:=12 
for j: =1 to string length(12) 
lexical_chart_fill (j-1, j) 
for i: j-2 down to 0 
syntactic_chart_fill(i, j) 
Fill the ?eld (j-1, j) in the chart with the word j which belongs to the preterminal 
category in Table 2. 
chart (j-1, j):={X | X wordj P} 
j-1=0, j=1, chart(0, 1):={The} j-1=1, j=2, chart(1, 2):={building} 
j-1=2, j=3, chart(2, 3):={blocks} j-1=3, j=4, chart(3, 4):={the} 
j-1=4, j=5, chart(4, 5):={sun} j-1=5, j=6, chart(5, 6):={shining} 
j-1=6, j=7, chart(6, 7):={on} j-1=7, j=8, chart(7, 8):={the} 
j-1=8, j=9, chart(8, 9):={house} j-1=9, j=10, chart(9, 10):={faded} 
j-1=10, j=11, chart(10, 11):={are} j-1=11, j=12, chart(11, 12):={red} 
The reduction steps abide by the syntactic rules by which the reduced symbols 
cover the string from i to j. 
Table 2. The successful processing matrix of Example 1. 
1 2 3 4 5 6 7 8 9 10 11 12 
0 {D} {} {NP} {} {} {} {} {} {} {NP} {} {S} 
1 {N} {NP} {} {} {} {} {} {} {} {} {} 
2 {N} {} {} {} {} {} {} {} {} {} 
3 {D} {NP} {} {} {} {NP} {IP} {} {} 
4 {N} {} {} {} {} {} {} {} 
5 {V} {} {} {SC} {} {} {} 
6 {P} {} {PP} {} {} {} 
7 {D} {NP} {} {} {} 
8 {N} {} {} {} 
9 {V} {} {} 
10 {V} {VP} 
11 {A} 
820 J. Du et al.
syntactic_chart_fill(i, j) 
for i: =0 to 10 
chart(i, j)={A: A BC P; i<k <j; B chart (i, k);C chart (k, j)} 
chart(i, j):={} 
for k:= i+1 to j-1 
for every A BC P 
if B chart (i, k) and C chart (k, j) 
then chart(i, j):=chart(i, j) {A} 
If S chart(0,n) 
then accept 
else reject. 
By means of discussion of English GP sentence, we ?nd that processing breakdown 
and backtracking is the special character of GP sentence. Generally speaking, the 
frequency is an important character to influence the result. The preferred choice with 
high frequency is replaced by the unpreferred one with low frequency when garden 
path effect takes place. Whether frequency in Chinese GP sentences is effective or not 
will be discussed as follows. 
3.3 Semantic Effect on Machine Translation of Chinese Garden Path 
Sentences 
Chinese GP sentences are closely related to frequency. According to Du’ idea in his 
doctoral dissertation entitled The Asymmetric Information Compensation Hypothesis: 
Research on Confusion Quotient in Garden Path Model, [25] Chinese GP model 
focuses more on semantic selection than English one, which intensively highlights the 
structural selection. 
Theta theory can be used to explain Chinese GP sentences. In the theta theory, the 
theta roles assigned to subject and object by verbs are considerably different. 
Example 3. Mary broke the windows last week. 
Example 4. Mary broke her legs last week. 
Example 5 daibu de shi jingcha. It was the police that ordered the arrest (of the 
suspect). 
Example 6 daibu de shi yifan. It was the suspect that was arrested (by the police). 
By comparing the English examples above, we can ?nd that the same verb broke 
assigns same internal theta role THEME to different internal arguments the windows 
and her legs, and gives different external theta roles AGENT and PATIENT to the same 
external argument Mary. We can obtain the argument structures based on Examples 3 
and 4. 
Break: V; [NP[+AGENT]/NP[+PATIENT] NP[+THEME]] 
The same theta roles in Chinese examples function as same as English examples. In 
Example 5, the verb shi assigns the internal theta role THEME to internal argument 
Towards Computing Technologies on Machine Parsing 821
jingcha, and then the whole verbal phrase provides the external theta role AGENT to 
external argument daibu de. This means the policeman has the power to arrest the 
suspect. In Example 6, yifan obtains the THEME role and daibu de is assigned as 
PATIENT. The same verb structure leads to different semantic selections, and the 
structural unbalance in frequency may result in the GP model. 
Shi: V; [NP[+AGENT]/NP[+PATIENT] NP[+THEME]] 
Confusion Quotient (CQ) is created to analyze the Chinese GP sentences. Dr. Du 
introduced the idea of Confusion Quotient for the analysis of Chinese GP sentences in 
his doctoral dissertation. He thought that the asymmetric information is the key factor 
to decode the complex sentences, and a lot of parameters are involved in the language 
processing. Please see the Formula. 
Vcq 
¼ 
1 
n 
X 
n 
i1 
2 
 
Oi 
Ei 
i i 
ð1Þ 
Vcqis value of confusion quotient. O (observer) means the real frequencies found in 
the corpus. E(expecter) means the ideal frequencies expected to be found in the corpus. 
N (number) means the numbers of peculiarities involved in the natural language pro-cessing. 
I (i) means the unit of peculiarity. If the value is positive, the range is (-8,1), 
and the negative result brings the duration of (1, 2). There are three conditions involved 
in the discussion. 
Firstly, two related syntactic structures A/B are hypothesized to exist. 
(1) Oi=Ei. This is a balanced structure in which all the syntactic effects of various 
structures are unobvious. Structure A and B have the same frequency, namely, 
they respectively occupy 50 per cent of the whole frequencies. The value of 
confusion quotient is 1, which means the ambiguousness appears and lingers on. 
Readers can obtain ambiguous meanings from the same structure and fail to 
distinguish them without the help of context. 
Example 7. Shangke de shi xiaowu. A: It is xiaowu who attends class. / B: It is 
xiaowu who gives a lecture. 
(2) Oi>Ei. This is an unbalanced structure in which the frequency promotes the pri-macy 
of prototype structure whose frequency is much greater than 50%. For 
example, if Structure A obtains much higher frequency than Structure B, A may 
be the prototype in the readers’ cognition, establishing the primacy of decoding. 
If A nearly occupies all the frequency while B fails to appear in the corpus, the 
value of confusion quotient directs to -8, which means the absolute primacy of 
prototype structure has no confusion. 
Example 8. Zheci xingdong zhong daibu de shi jingcha. The police will be arrested in 
the action. 
In the Chinese corpus (http://www.cncorpus.org/ccindex.aspx), the structure of 
“Shi: V; [NP[+PATIENT] NP[+THEME]]” is the prototype, occupying very high frequency 
in which Jingcha is the patient and will be arrested. 
822 J. Du et al.
From the discussion above, we can see that the prototypical structure (A structure) 
has different confusion quotient, and the value varies in the duration (-8,1). If Oiof A 
structure is great enough, the CQ value directs (-8), which means the least confusion. 
If Oi of A structure has the same frequency with Ei, A structure and B structure share 
the frequency and they are ambiguous structure. The value of CQ of A structure is 1, 
which is the extreme of prototype structure. Once the frequency of A structure is lower 
than B structure, A will not be considered to be the prototype. Therefore, the CQ value 
of prototype structure is a semi-opened structure in which the lowest frequency brings 
the greatest confusion value 1, and the highest frequency leads to least confusion value 
(-8). 
(3) Oi<Ei. This is another unbalanced structure. If Structure A has low observed fre-quency 
(Oi) than the expected frequency (Ei), the GP model may gradually take 
effect. For example, if frequency of Oiis high towards the same frequency of Ei, the 
value of CQ of A structure touches the extreme of 1, the smallest value of GP 
model. If frequency of Oi is low enough towards zero, the value of CQ of A 
structure nearly reaches the extreme of 2, the greatest value of GP model. In other 
words, the CQ value of GP model is a closed structure where the variation of value 
exists between 1 and 2. The greater the value is, the more confusion is, and the 
complexer GP cognition is Statistics helps to de?ne the GP model’s analysis. 
According to statistics, if the signi?cance level is .05; degree of freedom is 1; then 
critical value is 3.84. If we hypothesize the whole frequency is 50, and the fre-quency 
of B structure is X. Then, the frequency of A structure is 50-X, and the 
expected frequency is half of the whole one, i.e. 25. 
x2 
¼ 
X ðO 
O EÞ 
E 
2 
ð2Þ 
The nonparametric statistics is useful to analyze the primacy of models. Chi-square 
test value is x2 ; O means observed frequency; E means expected frequency. According 
to the hypothesis, we can obtain the GP’s extreme frequency by which the critical ratio 
between A structure and B structure can be established. Please see the Table 3. 
In Table 3, we can calculate X. (X-25)
2 
/25=1.92, X=18?This means the ratio 
between A frequency and B frequency is 32: 18. If the observed frequency of Struc-ture 
A is higher than 32, A structure will be an obvious prototype structure. The higher 
the frequency is, the easier the structure is. There is no cognitive overburden for 
Table 3. Chi-square test of GP model’s frequency. 
Category Observed Expected Deviation D2 D2 /E 
A structure 50-X 25 25-X (25-X)
2 
(25-X)
2 
/25 
B structure X 25 X-25 (X-25)
2 
(X-25)
2 
/25 
Total 50 50 3.84 
Towards Computing Technologies on Machine Parsing 823
readers. If the observed frequency is lower than 32, the frequency of A structure will 
approach the frequency of B structure, resulting in the evitable ambiguity. 
The critical value of confusion quotient can be calculated. According to the dis-cussion 
above, we hypothesize the total is 50; the critical observed number of A 
structure is 32; the critical expected number of B structure is 18; the number of 
peculiarities involved in the processing is 1. According to Formula 1, the critical value 
of CQ can be obtained. 
In Table 4, value of CQ of A structure is 0.72, which belongs to (-8,1), and value 
of CQ of B structure is 1.28, which belongs to (1, 2). Thus we ?nd that if the value of 
CQ of A structure is lower than 0.72, this structure is considered the prototype. If the 
value of CQ of B structure is higher than 1.28, B structure has more potential to 
promote the GP model. The Chinese GP sentence is as follows. 
Example 9. Daibu de shi jingcha; shenpan de shi faguan; fuxing de shi zuifan. It was 
the police that ordered the arrest (of the suspect); it was the judge that sentenced (sb to a 
long term of imprisonment); it was the criminal that served a prison sentence. 
Example 9 is a Chinese GP sentence in which daibu and shenpan are two potential 
peculiarities involved in the language processing, namely, N = 2. According to the 
Formula 1, the statistics-based analysis helps to decode GP sentence. All the linguistic 
data is from the website http://www.cncorpus.org/ccindex.aspx. 
We can obtain CQ values of daibu and shenpan with the peculiarity of “V; NP 
[PATIENT]NP[THEME]”, namely, Vcq 
¼ 
ð2 þ 2Þ=2 
¼ 
2. The CQ value of “V; NP 
[AGENT]NP[THEME]” can be shown as Vcq 
¼ 
ð0:54 þ 1:67Þ=2 
¼ 
0:57. The dif-ferent 
values express the level of confusion. The higher the value is, the more 
confusion is. If the cognition shifts from the least-confused structure (0.57) to the most-confused 
structure (2), GP model takes effect. 
If there is no shift from the prototype structure to the low frequency structure, GP 
effect fails to appear. The contrastive sentence to Example 9 is as follows: 
Example 10. Daibu de shi yifan; shenpan de shi baotu; fuxing de shi zuifan. It was the 
suspect that was arrested; it was the thugs that were sentenced (to death); it was the 
criminal that served a prison sentence. 
According to the frequency list shown in Tables 5 and 6 about daibu and shenpan, 
we can obtain Vcq 
¼ 
ð0:54 þ 1:67Þ=2 
¼ 
0:57, which is lower than the critical CQ 
value of 0.72. That means Example 10 is a prototype structure and no shift happens, 
without bringing the breakdown or overburden to the cognition, even though both 
Examples 9 and 10 share the same structure. 
Table 4. Confusion quotient value of GP model. 
Category Observed Expected O-E (O-E)/E 1-(O-E)/E 
A structure 32 25 +7 0.28 0.72 
B structure 18 25 -7 -0.28 1.28 
Total 50 50 
824 J. Du et al.
The discussion above shows that Chinese GP model is closely related to semantic 
information, and the same syntactic structure leads to different even opposite meaning. 
On the contrary, syntactic structure considerably affects the processing of English GP 
sentences. 
4 Conclusion 
MT is a sub-?eld of computational linguistics and is closely related to linguistics. Both 
computational technologies and linguistic knowledge are necessary for MT. The 
support and integration of rule-based module and statistical translation module is 
important as an ongoing and stable factor for MT system. For the head-initial language, 
e.g. English, syntactic information is the key to process GP sentences while for the 
head-?nal language, e.g. Chinese, semantic information is crucial for system to decode 
GP sentences. It is concluded that the integration of various methods, e.g. CFG, RTN, 
Table 5. The calculation of CQ value of daibu. 
Category Observed Expected O-E (O-E)/E 1-(O-E)/E 
Daibu:V; NP[AGENT]NP[THEME] 44 17.33 +26.67 1.54 -0.54 
Daibu:V; NP[PATIENT]NP[THEME] 0 17.33 -17.33 -1 2 
Daibu:V; 
NP[THEME][bei] 
22 17.33 +4.67 0.27 0.73 
Daibu:N 18 17.33 +0.67 0.04 0.96 
Daibu:V; 
NP[THEME] 
14 17.33 -3.33 -0.19 1.19 
Daibu:V; 
NP[THEME][ba/jiang] 
6 17.33 -11.33 -0.65 1.65 
Total 104 104 
Table 6. The calculation of CQ value of shenpan. 
Category Observed Expected O-E (O-E)/E 1-(O-E)/E 
Shenpan: V; 
NP[AGENT]NP[THEME] 
11 33.33 -22.33 -0.67 1.67 
Shenpan: V; 
NP[PATIENT]NP[THEME] 
0 33.33 -33.33 -1.00 2.00 
Shenpan: V; 
NP[THEME][bei] 
2 33.33 -31.33 -0.94 1.94 
Shenpan: N 146 33.33 112.67 3.38 -2.38 
Shenpan: V; 
NP[THEME] 
39 33.33 5.67 0.17 0.83 
Shenpan: V; 
NP[THEME][ba/jiang] 
2 33.33 -31.33 -0.94 1.94 
Total 200 200 
Towards Computing Technologies on Machine Parsing 825
CYK, WFST and CQ analysis, are effective to parse GP sentences. With the devel-opment 
of cognitive science, the work of cognitive psychology and brain sciences can 
be introduced to analyze the processing breakdown of GP sentences, which makes 
human computer interaction intertwined with cognitive engineering, as a result, 
markedly improving the performance of translation in the future. 
References 
1. Dangsaart, S., et al.: Intelligent Thai text–Thai sign translation for language learning. 
Comput. Educ. 3(51), 1125–1141 (2008) 
2. Dorr, B.: Interlingual machine translation a parameterized approach. Artif. Intell. 1(63), 429– 
492 (1993) 
3. Steiner, E.: Some remarks on a functional level for machine translation. Lang. Sci. 4(14), 
607–621 (1992) 
4. Du, J.L., Yu, P.F.: Syntax-directed machine translation of natural language: effect of garden 
path phenomenon on sentence structure. In: 2010 International Conference on Intelligent 
Systems Design and Engineering Applications, pp. 535–539. IEEE (2010) 
5. Wang, Y. et al.: A new method to calibrate robot visual measurement system. In: Advances 
in Mechanical Engineering (2013) 
6. Wang, X., Wanli, Z., Wang, Y.: A novel approach to word sense disambiguation based on 
topical and semantic association. Sci. World J. 2013, 8 pages (2013) 
7. Bever, T.G.: The cognitive basis for linguistic structures. In: Hayes, J.R. (ed.) Cognition and 
the Development of Language, pp. 279–362. Wiley, New York (1970) 
8. Lin, C., Bever, T.G.: Garden path and the comprehension of head-?nal relative clauses. In: 
Processing and Producing Head-Final Structures, pp. 277–297. Springer, Netherlands (2011) 
9. Nitta, Y.: Problems of machine translation systems: effect of cultural differences on sentence 
structure. Futur. Gener. Comput. Syst. 2(2), 101–115 (1986) 
10. Pérez, A., Torres, M.I., Casacuberta, F.: Joining linguistic and statistical methods for 
Spanish-to-Basque speech translation. Speech Commun. 11(50), 1021–1033 (2008) 
11. Shin, J.H., Panayiotis, G., Shrikanth, N.: Towards modeling user behavior in interactions 
mediated through an automated bidirectional speech translation system. Comput. Speech 
Lang. 2(24), 232–256 (2010) 
12. Khalilov, M., Fonollosa, J.A.R.: Syntax-based reordering for statistical machine translation. 
Comput. Speech Lang. 4(25), 761–788 (2011) 
13. Lehtokangas, R., Airio, E., Järvelin, K.: Transitive dictionary translation challenges direct 
dictionary translation in CLIR. Inf. Process. Manag. 6(40), 973–988 (2004) 
14. Zhang, R., Kikui, G.: Integration of speech recognition and machine translation: speech 
recognition word lattice translation. Speech Commun. 48, 321–334 (2006) 
15. López, V.F., et al.: A SomAgent statistical machine translation. Appl. Soft Comput. 2(11), 
2925–2933 (2011) 
16. Soderland, S., et al.: Panlingual lexical translation via probabilistic inference. Artif. Intell. 9 
(174), 619–637 (2010) 
17. Sassa, M., Ito, Y., Kohama, M.: Comparison and evaluation of back-translation algorithms 
for static single assignment forms. Comput. Lang. Syst. Struct. 2(35), 173–195 (2009) 
18. Mankai, C., Mili, A.: Machine translation from Arabic to English and French. Inf. Sci. Appl. 
3(2), 91–109 (1995) 
19. Oard, D.W., He, D., Wang, J.: User-assisted query translation for interactive cross-language 
information retrieval. Inf. Process. Manag. 1(44), 181–211 (2008) 
826 J. Du et al.
20. Casacuberta, F., et al.: Some approaches to statistical and ?nite-state speech-to-speech 
translation. Comput. Speech Lang. 1(18), 25–47 (2004) 
21. San-Segundo, R., et al.: Speech to sign language translation system for Spanish. Speech 
Commun. 11(50), 1009–1020 (2008) 
22. Raman, S., Reddy, N.R.: A transputer-based parallel machine translation system for Indian 
languages. Microprocess. Microsyst. 6(20), 373–383 (1997) 
23. Patson, N.D., et al.: Lingering misinterpretations in garden-path sentences: evidence from a 
paraphrasing task. J. Exp. Psychol. Learn. Mem. Cogn. 1(35), 280–285 (2009) 
24. Farmer, T.A., Sarah, E., Spivey, M.J.: Gradiency and visual context in syntactic garden-paths. 
J. Mem. Lang. 4(57), 570–595 (2007) 
25. Du, J.L.: The Asymmetric Information Compensation Hypothesis: Research on Confusion 
Quotient in Garden Path Model. The Commercial Press, Beijing, China (2015) 
Towards Computing Technologies on Machine Parsing 827
Music Recommender According to the User 
Current Mood 
Murtadha Al-Maliki(&) 
School of Engineering, University of Portsmouth, Portsmouth, UK 
murtadha.al-maliki@port.ac.uk 
Abstract. The researcher reviews the RS human-centered design that considers 
the mood of the user prior to making a recommendation, whereby the term mood 
refers to the continuously changing general emotional states felt by users. Mood 
management theory stipulates that people will adjust their environment, 
including deciding to expose themselves to certain media, for the tenacities of 
dealing with their emotional state. However, the context-awareness has turn out 
to be one on the interior technologies yet the critical function for utility appli-cations 
of inclusive computing environment. The venture concerning the use of 
context data for inferring a user’s state of affairs is referred according to, 
namely, adherence reasoning. In this research, we included the functionality 
about affection cause in a music recommendation system taking into account the 
mood management theory. Our proposed system includes certain modules like, 
Mood Module and Recommendation Module. The Mood Module determines 
the genre on the music suitable according to the user’s context. Finally, the 
Recommendation Module recommends the track according to the user current 
mood. 
Keywords: Context-AwarenessMusic recommendation system 
Mood management theory 
1 Introduction 
The researcher reviews the RS human-centered design that considers the mood of the 
user prior to making a recommendation, whereby the term mood refers to the con-tinuously 
changing general emotional states felt by users [1]. The core consideration is 
that users have the ability to dynamically update their mood throughout and following 
numerous activities like listening of a light music, playing a strategy video game, or 
watching a comedy [1, 5]. In other words, users can make an unconscious and con-scious 
choice over the content of entertainment that assists in maintenance of positive 
mood and healing or moderate pain in terms of both duration and intensity [4, 9] which 
have been speci?ed under the label of mood management (Knobloch-Westerwick 
2006). 
In their study, [9] acknowledged that a broad collection of information consump-tion 
from music, news, movies, and documents are impacted by user’s mood. The 
concept has been further scrutinized in the research community of mood management 
[3]. In speci?c, selection of music is characterized by self-indulgent motivations to 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 828–834, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_61
either mend their negative mood or preserve their positive mood in terms of both 
duration and intensity. In regard to this, a user’s emotional state serves as a signi?- 
cantly useful predictor of their decisions of music. While a selection of genres can 
possibly drive entertainment choice, tragic or sad contents are perhaps more likely to be 
circumvented by alarge number of users; while funny or light-hearted music is mostly 
sought after [4, 6]. Similarly, [7] established three key motivations activating users’ 
movie-going behaviors: entertainment, self-escape, and self-development. The former 
two appear to be consistent with self-indulgent considerations for users to mend or 
preserve positive mood; while the latter appears to be the least correlated to self-indulgent 
considerations, in distinction, reflecting users’ eudemonic motivations in 
pursuing greater meaningfulness and insight towards life for self-reflection [8]. Con-sequently, 
a wider collection of movies have been con?rmed to establish the con-nections 
between users’ mood states and preferences of entertainment [4] (Fig. 1). 
2 Framework 
2.1 Last.FM 
Last.fm is a music web services providing a public API which gives a good opportunity 
to the researchers to build their own programs using Last.fm data. The Last.fm API also 
allows calling methods that respond in REST style XML or JSON. This module has the 
ability of collecting all data from Last.fm, such as artists, their songs, their albums, their 
tags. So, in order to achieve its purpose, the Song Searcher1 module has the following 
main features: 
• Artist tags retrieval 
• Artist albums retrieval 
• Tracks or songs retrieval 
• Artist basic statistics retrieval 
So the task of the Las.FM module is to keep a recent list of all the music ?les 
updated by reading the tags ?elds from.mp3 ?le which contains the songs library. It 
gathers all artists and their relevant information which already mentioned in above from 
the Last.fm and redirecting that same information to the speci?c databases present in 
the database server. It also identi?es and interprets the missing or the wrong infor-mation 
of a song by using the API through Last.FM and using the crawler to get 
information of the track by locating its metadata. 
The additional information of the songs such as date of release and the album 
details can help in bringing much accurate song suggestions for the user. Basically, the 
function of this component is to provide the user with the complete information of the 
Mood 
Management 
Song library .mp3 
Last.FM 
Recommender 
Engine 
Fig. 1. The Framework. 
Music Recommender According to the User Current Mood 829
song. The user credibility is increased as better recommendations can be drawn for the 
user by the system that can help in providing quality content and suitable music choices 
to the user. The system uses track.getInfo to get the information of the song which can 
be served as user metadata (Algorithm 1). 
Algorithm 1: Last.FM 
1. While there exists a track to be added to the database do 
2. Go through the mp3 ?le tags 
3. Establish a connection to the Last.FM site 
4. If the tags are found to be null then 
5. Collect ant tags from the Last.FM and added to the update 
6. End if 
7. Terminate from the while 
2.2 Song Library.Mp3 
It is the database when all songs are kept. 
2.3 Mood Management 
Our recommender system has 18 mood categories [2]. In [2], the author divided the 
mood to the 18 mood tag group categories. The mood tag groups were derived by a 
method combining the strength of social tags, linguistic resources and human expertise 
as shown in Table 1. 
The songs in our library have been classi?ed according to the Algorithm 2 which 
shows that the song tags with its frequency will be collected from last.fm by using the 
method track.getTopTags and then the tags for each song will be compared with the 
mood based words which belong for each mood category (Table 1) after that the mood 
song factor will be determined according to (1). 
Mood Factor 
¼ 
X 
ðmatched tags belong for specific mood category * tag 
frequencyÞ 
ð1Þ 
Algorithm 2: Mood Manager 
1: While there are songs S in songs database do 
2: Get the song tags with tag frequency from last.fm 
3: Compare the collected tags for each song with mood standard table (Table 1) and 
save the matched tag for each mood category 
4: Calculate the mood factor for each song (mood factor equation) 
5: end while 
2.4 Recommender Engine 
Our system is designed to recommend music to the user according to the current user 
mood in a tricky way according to the mood management theory for example if the 
830 M. Al-Maliki
current user mood is angry the system will provide him/her calm songs to make 
him/her more comfortable. There are 18 mood categories in our system, therefore, a 
questionnaire have been made to know the users listening taste under these 18 mood 
categories according to the mood management theory. 96 users [54 female, 42 male] 
Table 1:. Mood Categories [2] 
Mood number Mood based words 
Mood 1 (Calm) calm, comfort, quiet, serene, mellow, chill out, calm down, calming, 
chillout, comforting, 
content, cool down, mellow music, mellow rock, peace of mind, 
quietness, relaxation, serenity, solace, soothe, soothing, still, tranquil, 
tranquility, tranquility 
Mood 2 (Sad) sad, sadness, unhappy, melancholic, melancholy, feeling sad, 
Mood 3 (Happy) happy, happiness, happy songs, happy music, glad 
Mood 4 
(Romantic) 
romantic, romantic music 
Mood 5 
(Gleeful) 
upbeat, gleeful, high spirits, zest, enthusiastic, buoyancy, elation, 
Mood 6 
(Depressed) 
depressed, blue, dark, depressive, dreary, gloom, darkness, depress, 
depression, depressing, gloomy 
Mood 7 (Angry) anger, angry, choleric, fury, outraged, rage, angry music 
Mood 8 (Grief) grief, heartbreak, mournful, sorrow, sorry, doleful, heartache, 
heartbreaking, heartsick, lachrymose, mourning, plaintive, regret, 
sorrowful 
Mood 9 
(Dreamy) 
dreamy 
Mood 10 
(Cheerful) 
cheerful, cheer up, festive, jolly, jovial, merry, cheer, cheering, cheery, 
get happy, rejoice, sunny 
Mood 11 
(Brooding) 
brooding, contemplative, meditative, reflective, broody, pensive, 
pondering, wistful 
Mood 12 
(Aggressive) 
aggression, aggressive 
Mood 13 
(Con?dent) 
con?dent, encouraging, encouragement, optimism, optimistic 
Mood 14 
(Anxious) 
angst, anxiety, anxious, jumpy, nervous, angsty 
Mood 15 
(Earnest) 
earnest, heartfelt 
Mood 16 
(Hopeful) 
desire, hope, hopeful 
Mood 17 
(Pessimism) 
pessimism, cynical, pessimistic, weltschmerz, cynical, sarcastic 
Mood 18 
(Excitement) 
excitement, exciting, exhilarating, thrill, ardor, stimulating, thrilling, 
titillating 
Music Recommender According to the User Current Mood 831
had been asked about music preferences when they feel: calm, sad, happy, romantic, 
gleeful, etc. as shown in Fig. 2. 
The line graph describes the preferences of what type of songs that 96 people would 
like to listen when they have different kind of mood. For instance, 64 and 48 of people 
would like to listen to excitement and gleeful songs respectively when their mood is 
clam while 68 and 64 would like to listen to excitement and romantic songs respec-tively 
when they are in happy mood. 
In our system, the highest three values for each mood will be used as shown in 
Table 2. 
Then the mood algorithm will be (Algorithm 3): 
Algorithm 3: Mood recommender algorithm 
1: Get the user current mood 
2: While there are songs S in song list do 
3: Get the song mood type 
4: If the song mood type ?ts with the normalized mood table (Table 2) then 
5: Add the song to the recommended list 
6: end if 
7: end while 
0 
20 
40 
60 
80 
100 
120 
No. of People 
Type of Songs 
Mood Data Distribu??on for 96 people 
Calm Sad Happy Roman??c Gleeful Earnest 
Depressed Angry Grief Dreamy Cheerful Pessimism 
Brooding Aggressive Anxious Confident Hopeful Excitement 
Fig. 2. Mood data collection. 
832 M. Al-Maliki
Table 2. Normalized mood table 
Mood 
number 
Mood name Normalize moods 
Normalize mood-1 Normalize mood-2 Normalize mood-3 
Name Value Score Name Value Score Name Value Score 
Mood 1 CALM EXCITEMENT 64 100 ROMANTIC 48 90 GLEEFUL 48 90 
Mood 2 SAD CHEERFUL 58 100 HOPEFUL 58 100 CONFIDENT 51 90 
Mood 3 HAPPY EXCITEMENT 67 100 ROMANTIC 64 90 HAPPY 61 80 
Mood 4 ROMANTIC ROMANTIC 96 100 HAPPY 64 90 DREAMY 64 90 
Mood 5 GLEEFUL EXCITEMENT 80 100 HAPPY 61 90 GLEEFUL 61 90 
Mood 6 EARNEST EXCITEMENT 64 100 CHEERFUL 64 100 HAPPY 45 90 
Mood 7 DEPRESSED CHEERFUL 74 100 CONFIDENT 64 90 HOPEFUL 61 80 
Mood 8 ANGRY CLAM 77 100 CONFIDENT 58 90 HOPEFUL 42 80 
Mood 9 GRIEF CHEERFUL 77 100 HOPEFUL 54 90 HAPPY 48 80 
Mood 10 DREAMY EXCITEMENT 86 100 ROMANTIC 58 90 GLEEFUL 48 80 
Mood 11 CHEERFUL EXCITEMENT 84 100 GLEEFUL 70 90 HAPPY 64 80 
Mood 12 PESSIMISM HOPEFUL 70 100 CHEERFUL 67 90 CONFIDENT 51 80 
Mood 13 BROODING HAPPY 64 100 CHEERFUL 64 100 HOPEFUL 45 90 
Mood 14 AGGRESSIVE CALM 83 100 EXCITEMENT 48 90 CONFIDENT 35 80 
Mood-15 ANXIOUS CONFIDENT 77 100 CALM 74 90 HOPEFUL 45 80 
Mood 16 CONFIDENT EXCITEMENT 86 100 GLEEFUL 64 90 CHEERFUL 48 80 
Mood 17 HEPEFUL EXCITEMENT 80 100 GLEEFUL 42 90 CHEERFUL 42 90 
Mood 18 EXCITEMENT EXCITEMENT 96 100 GLEEFUL 61 90 HAPPY 48 80 
Music Recommender According to the User Current Mood 833
3 Conclusion and Future Work 
In this work, the songs have been classi?ed according to the mood. The classi?cation 
process had been done by collecting the songs’ tags from last.fm and compares them 
with the standard mood classi?cation table (Table 1). The questionnaire had been made 
and 96 users had been asked about their preferences to listen when they are in speci?c 
mood and the data were distributed according to the results from that questionnaire. 
The future work will be evaluated by using the online evaluation method and ask real 
user to test the system. 
References 
1. Holbrook, M.B., Gardner, M.P.: Illustrating a dynamic model of the mood-updating process 
in consumer behavior. Psychol. Mark. 17(3), 165 (2000) 
2. Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classi?cation. Am. 
Music. 183(5,049), 2–209 (2009) 
3. Knobloch-Westerwick, S.: Mood management: theory, evidence, and advancements. In: 
Bryant, J., Vorderer, P. (eds.) Psychology of Entertainment, pp. 239–254 (2006) 
4. Oliver, M.B.: Tender affective states as predictors of entertainment preference. J. Commun. 58 
(1), 40–61 (2008) 
5. Ryan, R.M., Rigby, C.S., Przybylski, A.: The motivational pull of video games: a self-determination 
theory approach. Motiv. Emot. 30(4), 344–360 (2006) 
6. Schaefer, A., Nils, F., Sanchez, X., Philippot, P.: A Multi-criteria Assessment of Emotional 
Films (unpublished manuscript) (2005) 
7. Tesser, A., Millar, K., Wu, C.H.: On the perceived functions of movies. J. Psychol. 122, 
441–449 (1998) 
8. Waterman, A.S.: Two conceptions of happiness: contrasts of personal expressiveness 
(eudaimonia) and hedonic enjoyment. J. Pers. Soc. Psychol. 64, 678–691 (1993) 
9. Zillmann, D.: Mood management: using entertainment to full advantage. In: Donohew, L., 
Sypher, H.E., Higgins, E.T. (eds.) Communication, Social Cognition, and Affect, pp. 147–171 
(1988) 
834 M. Al-Maliki
Development of Extreme Learning Machine 
Radial Basis Function Neural Network Models 
to Predict Residual Aluminum for Water 
Treatment Plants 
C. D. Jayaweera(&) and N. Aziz 
School of Chemical Engineering, Engineering Campus, 
Universiti Sains Malaysia, Seri Ampangan, Seberang Perai Selatan, 
14300 Nibong Tebal, Penang, Malaysia 
chamanthidj@gmail.com, chnaziz@usm.my 
Abstract. Two sets of input parameters were employed to develop Extreme 
Learning Machine Radial Basis Function (ELM-RBF) models predicting 
residual aluminum, in order to facilitate parametric analysis of reported physical 
and chemical phenomena relating to the effect of alum dosage, raw water 
(RW) turbidity and RW color on residual aluminum concentration. RW turbidity 
was identi?ed as the dominant variable affecting the distribution of the multi-variate 
data, condensed into two principal components using principal compo-nent 
analysis. Thus two sets of models were developed based on the RW 
turbidity value: low turbidity models and high turbidity models. The perfor-mance 
of all models was satisfactory, with test correlation coef?cients exceeding 
0.85. The shapes of the plots of the parametric analysis were satisfactory and 
were in line with reported phenomena. However, the numerical accuracy of the 
plots obtained by the parametric analysis was poor. It was noted that using data 
with a wider range of values for the dominant variable (RW turbidity) helped 
improve the parametric plots. 
Keywords: Residual aluminumELM-RBFWater treatment 
1 Introduction 
Soft sensors are applied in a wide range of applications such as environmental emission 
monitoring, estimation of product compositions in distillation columns, etc. Usage of 
soft sensors enable tighter control of the most critical parameters of a production 
process, implementing early warning systems, simulating ‘what if’ scenarios, replacing 
and optimization of the use of expensive hardware sensors. Soft sensors cannot 
completely replace hardware sensors. However a signi?cant economic bene?t could be 
realized by intelligent use of cheap hardware or ef?cient use of expensive hardware 
sensors in combination with soft sensors [1]. Thus, soft sensors could be used in water 
treatment applications for predicting treated water qualities as measures to prevent 
process upsets (early warning systems) or to replace hardware sensors. Residual alu-minum 
is a major concern in water treatment which could cause health issues such as 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 835–848, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_62
the Alzheimer’s disease. Therefore development of a soft sensor predicting residual 
aluminum would be bene?cial as a preventive measure to avoid hindering the operation 
of water treatment plants. 
Water characteristics such as pH, natural organic matter and alum dosage are few of 
many factors affecting residual aluminum content. While extensive studies have been 
carried out to model the coagulation process for water treatment [2–12], very few 
studies had been carried out in modeling residual aluminum. The author in [12] 
developed a multiple input multiple output model predicting residual aluminum and pH 
using 11 input parameters. In [13], the author carried out studies to determine optimum 
parameters to model residual aluminum content with satisfactory performance. How-ever, 
no studies have been carried out regarding modeling of residual aluminum using 
the ELM-RBF model. 
However, research has been done to examine chemical and physical phenomena 
associated with residual aluminum, such as the effects of alum dosage, natural organic 
matter, pH and temperature on the residual aluminum concentration. Al exists in sol-uble 
forms at pH less than 6. At higher pH the solubility of Al declines and it is easier 
to reduce the residual Al concentration [13]. The residual Al content could also be an 
issue for water plants situated in cold areas, as low temperatures tend to increase the 
residual Al concentration [14]. The speciation of Al depends on the competition of Al 
cations for anionic groups. Monomeric forms of Al tend to form complexes with 
organic matter [13]. In their study [15], the authors mention that the dose of the 
coagulant is also a prominent factor affecting residual Al content. Aluminum ions 
remain in solution until the binding capacity of NOM is ful?lled. On increasing the 
alum dosage, once the maximum binding capacity of NOM has been exceeded, the 
complex formed is bound to precipitate. The ability of the said complex to precipitate 
depends on pH. 
Efforts have been made by researchers to capture reported physical and chemical 
phenomena in their models. Authors in [2] developed models predicting treated water 
turbidity and studied the models’ ability to capture reported chemical and physical 
phenomena. It was observed that multi-layer perceptron and general regression neural 
network models were not able to demonstrate complex relationships such as the 
response of TW turbidity to changes in alum dosage. However, it has been reported that 
ELM-RBF models demonstrate excellent generalization similar to support vector 
machines [16]. Therefore ELM-RBF neural network was employed for model devel-opment 
in this research. 
This paper analyses the model’s ability to satisfactorily predict residual aluminum 
content and capture reported chemical and physical phenomena under two conditions: 
at high and low turbidity. The reason for using raw turbidity as the determining factor is 
discussed in Sect. 2. Two sets of input parameters were selected to facilitate carrying 
out a feasible parametric analysis. 
Objectives of this study are to 
(1) Develop models predicting residual aluminum with satisfactory performance. 
(2) Examine the model’s ability to capture reported physical and chemical phenom-ena 
relating to the response of residual aluminum to changes in alum dosage, 
color and RW turbidity. 
836 C. D. Jayaweera and N. Aziz
It was assumed in this research, that natural organic matter was responsible for 
color. However, there are no means to con?rm this hypothesis as no total organic 
carbon or UV-254 data was available [17]. However, it was observed that the plots 
obtained from the parametric analysis closely resembled the behavior expected of 
residual aluminum in response to NOM. 
2 Methodology 
Data provided by the Segama water treatment plant in Sabah, Malaysia, was utilized for 
model development. Water quality ranges of the water treated is given in Table 1. 
Two ELM-RBF models were developed using input parameters as shown in 
Table 2. Selection of input parameters was carried out via an exhaustive search by 
testing the performance of models using all possible combinations of variables. The 
number of radial basis centers used in both models is 20. The radbas function (ex2 ) 
was used for the transfer function of the radial basis layer. All values were normalized 
using (1), such that all values ranged between 0 and 1. 
xnorm 
¼ 
x 
. 
xmin 
xmax 
. 
xmin 
ð1Þ 
A principal component analysis was carried out to condense all variables into 2 
signi?cant variables in order to enable visualizing the data distribution in a 2D plot. 
The purpose of plotting the distribution of data is to enable selection of a suitable 
subset of data to improve the accuracy of the prediction model. Thus, during imple-mentation 
of a neural network model in an industrial scale, the model itself will be able 
to self-diagnose its validity for prevalent process conditions, which would increase the 
model’s robustness and reliability. Therefore, it was considered more effective to 
determine the suitable subset of data based on water quality variables which could be 
Table 1. Water quality ranges 
Raw water Treated water 
pH Turb 
(NTU) 
Col 
(HU) 
TDS Alkalinity 
(mg/l) 
pH Turb 
(NTU) 
Color 
(HU) 
TDS Alkalinity 
(mg/l) 
Al 
residual 
(mg/l) 
Coagulant 
dosage 
Min 6.6 11 36 60 50 6.5 0.19 0 90 20 0.01 20 
Max 7.9 3405 656 160 170 7.5 5.24 6 170 100 0.2 170 
Table 2. Input parameters used for the development of models 
Model Variables 
1 RW turbidity, RW color, TW turbidity, Alum dosage, residual Al (t-1) 
2 RW turbidity, RW color, TW color, Alum dosage, residual Al (t-1) 
Development of ELM-RBF Neural Network Models 837
measured using sensors and exclude alum dosage in the analysis. Thus, a model could 
be expected to warn the operators of its reliability for a particular process condition, by 
conveniently reading a value from a sensor measuring a water quality variable 
(preferably a raw water quality). Results of the principal component analysis are shown 
in Table 3. 
Thus the 5 variables are reduced to 2 components accounting for 74% of total 
variance. The correlation coef?cients of each variable with respect to the two com-ponents 
are given in Table 4. 
According to Table 4, PC1 is dominated by RW turbidity as it has the highest 
coef?cient, and PC2 is dominated by TW color. 
The plot of PC1 vs. PC2 is given in Fig. 1. 
It could be noted in Fig. 1 how data is clustered into groups. As noted in Table 4, 
the most signi?cant variable affecting the distribution in Fig. 1 is RW turbidity. Visual 
analysis of RW turbidity data showed that majority of the normalized values were less 
than 0.1. Therefore the indexes of data with RW turbidity < 0.1 were separated and 
plotted on the graph in Fig. 1. The resultant plot is shown in Fig. 2. 
Figure 2 demonstrates how an entire cluster of Fig. 1 had been occupied by data 
with RW turbidity < 0.1. The red colored cluster constituted 83% of the total set of 
data. Therefore it was decided to develop two models for the interest of maintaining 
high accuracy and model performance: A model for low turbidity water (RW turbid-ity 
< 0.1) and a model for high turbidity water (RW turbidity > 0.1). 
Data division was carried out such that 65% was used for training (for the Hessian 
matrix, which directly calculates the weights connecting the radial basis layer to the 
output layer), 25% was used for validation (in order to ?nd the ?ttest set of input 
Table 3. Principal components 
Principal component Eigen value % variance Cumulative variance 
PC1 0.0056 46 46 
PC2 0.0034 28 74 
PC3 0.0018 15 89 
PC4 0.0011 9 98 
PC5 0.0003 2 100 
Table 4. Coef?cients of variables 
Variables PC1 PC2 
RW color 0.30 -W0.22 
RW turbidity 0.77 -W0.52 
TW turbidity 0.32 0.44 
TW color 0.44 0.69 
Residual Al (t-1) 0.16 0.11 
838 C. D. Jayaweera and N. Aziz
weights/radial basis centers from 100 sets of random radial basis centers) and 10% was 
used for testing. 
The performance of each model developed was measured using mean square error 
and correlation coef?cient. A parametric analysis was also carried out to investigate the 
ability of the model to capture reported physical and chemical phenomena. The 
parametric analysis carried out in this study discusses the variation of the residual 
aluminum concentration with RW color, RW turbidity and alum dosage. The two 
models shown in Table 2 were developed in order to facilitate carrying the parametric 
analysis. Model 1 enables analyzing the variation of residual aluminum with RW color, 
as the absence of TW color makes the analysis feasible. Similarly model 2 enables 
analyzing the variation of residual aluminum with RW turbidity. The effect of alum 
dosage on residual aluminum was observed on both models. 
Fig. 1. Plot of principal components. X axis – PC2, Y axis – PC1. 
Fig. 2. Plot of data with RW turbidity < 0.1. 
Development of ELM-RBF Neural Network Models 839
pH is a prominent variable affecting the residual aluminum content. The solubility 
of aluminum increases at low pH values (less than 6) resulting in an increase in the 
residual aluminum concentration. The residual aluminum content decreases at high pH 
values (higher than 6) due to the decreasing solubility. However, the pH range of the 
water utilized in this process ranged between 6.5 and 7.5. Therefore it was highly 
unlikely that the effect of pH could be visualized using the available data. 
Temperature is one other factors affecting residual aluminum content, though 
temperature measurements were not available. However, data used in this research 
were from a water treatment plant in a tropical area where signi?cant changes in 
temperature in the form of seasons do not occur. 
3 Results and Discussion 
3.1 Low Turbidity Models 
The performance of the two models developed for low turbidity water is shown in 
Table 5. 
The test correlation coef?cients of both models have exceeded 0.85. Thus, both 
models have demonstrated satisfactory performance. The test performance of model 2 
is higher than model 1. It could be noted in Table 2, that model 1 is more influenced by 
behavior of turbidity, while model 2 is more influenced by color. It has been indicated 
in literature that at low turbidity values, the coagulant demand is governed by NOM, 
and at high turbidity values the demand is governed by turbidity [15], which may have 
influenced the performance of the models. The regression plots of predicted test data 
the two models are given in Figs. 3 and 4. 
Predicted data distribution of model 2, as shown in Fig. 4, is closer to the y = x 
line, than the distribution of model 1 (Fig. 3). Therefore, the regression plots are in line 
with test performance given in Table 5. 
The effect of color on residual aluminum was investigated while other variables 
were ?xed at constant values. The plot obtained is shown in Fig. 5. 
It could be noted in Fig. 5 that the numerical accuracy of the predicted variation is 
poor. However the shape of the curve obtained has a reasonable validity. The color was 
increased from a very low value. Therefore, at initial stages, color removal by addition 
of coagulant is not possible due to low values. As the color increases to a certain 
removable degree, the residual aluminum will begin to decrease from being consumed 
for color removal. According to [18], natural organic matter (which is assumed to be 
Table 5. Performance of models developed for low turbidity water 
Models Training Validation Testing 
CC MSE CC MSE CC MSE 
1 0.8999 1.92 
S 
10
-04 
0.9131 1.54 
0 
10
-04 
0.8681 4.06 
0 
10
-04 
2 0.8996 1.92 
0 
10
-04 
0.8947 2.88 
0 
10
-04 
0.8959 1.12 
0 
10
-04 
840 C. D. Jayaweera and N. Aziz
Fig. 3. Regression plot of model 1 (X axis – actual data, Y axis – predicted data). 
Fig. 4. Regression plot of model 2 (X axis – actual residual Al, Y axis predicted residual Al). 
Fig. 5. Variation of residual aluminum with RW color as predicted by model 1. X axis – RW 
color, Y axis – residual aluminum concentration. 
Development of ELM-RBF Neural Network Models 841
responsible for color in this study) forms complexes with aluminum and hinders the 
aluminum ions in the turbidity removal process. However, NOM has a maximum 
binding capacity of aluminum. Once the coagulant dosage has been increased so as to 
exceed the binding capacity of NOM, the complex formed is bound to precipitate, 
simultaneously enabling ef?cient turbidity removal. The so called point of binding 
capacity could be related to the minimum observed in the curve in Fig. 5. As the color 
(NOM) increases, the Al ions in solution will increasingly form complexes with NOM 
and remain in solution, possibly hindering turbidity removal, thus, relating to the 
increasing trend in residual aluminum content. However, as the alum dosage is kept at a 
constant, the residual aluminum content will increase to a maximum remain constant 
despite of increasing color. 
The variation of residual aluminum with RW turbidity is shown in Fig. 6. 
The numerical accuracy of the plot in Fig. 6 is poor. However, as in the case of 
Fig. 5, the shape of the curve demonstrates some validity. 
It was expected for the residual aluminum to remain constant at very low values of 
turbidity as adding coagulant does not remove low levels of turbidity. As the turbidity 
reaches a removable level, the residual aluminum was expected to decrease as the alum 
is consumed in turbidity removal. However, as the turbidity increases while the alum 
dosage remains constant, particles flocculate around aluminum ions to form agglom-erates 
which are not large enough to be ?ltered, thus causing an increase in the residual 
aluminum. The residual aluminum content reach and remain at a maximum, despite 
increasing turbidity, as the alum dosage is kept at a constant. The expected behavior is 
fairly reflected in the shape of the plot in Fig. 6. 
The plots of residual aluminum vs. alum dosage are shown in Figs. 7 and 8. 
Fig. 6. Variation of residual aluminum with RW turbidity as predicted by model 2 (X – RW 
turbidity, Y – residual aluminum). 
842 C. D. Jayaweera and N. Aziz
The numerical accuracy of Figs. 7 and 8 is poor. The expected trends also cannot 
be visualized in the graphs. An ideal plot of residual aluminum versus RW turbidity is 
expected to have the following trends. 
Residual aluminum concentration should gradually increase at low values of alum 
dosage, as the coagulant concentration is not suf?cient for turbidity or color removal. 
Once suf?cient alum has been added the residual aluminum content is expected to 
decrease. The curve should reach a minimum (at maximum turbidity and color 
removal) and begin to increase with alum dosage. 
The narrow range of data used for model development could be one of the reasons 
for the models’ inability to capture the relationship with alum dosage ef?ciently. 
Fig. 7. Variation of residual aluminum with alum dosage as predicted by model 1 (X – alum 
dosage, Y – residual aluminum). 
Fig. 8. Variation of residual aluminum with alum dosage as predicted by model 2 (X – alum 
dosage, Y – residual aluminum). 
Development of ELM-RBF Neural Network Models 843
3.2 High Turbidity Models 
The performance of the two models developed for high turbidity water is shown in 
Table 6. 
It could be noted how the mean squared error had increased compared to Table 5 
due to the wider range of RW turbidity in the data. However, the test correlation 
coef?cient remains to exceed 0.85, thus, demonstrating satisfactory performance. The 
test performance of Model 1 is higher than model 2. As per the discussion under 
Table 5, it could be noted that model 1, which is mainly influenced by the turbidity 
behavior, is more suitable to the high turbidity condition. The regression plots of the 
two models are shown in Figs. 9 and 10. 
Table 6. Performance of models developed using data with RW turbidity values > 0.1 
Models Training Validation Testing 
CC MSE CC MSE CC MSE 
1 0.8938 3.44 
S 
10
-04 
0.8831 2.59 
0 
10
-04 
0.8834 0.0057 
2 0.8999 3.25 
0 
10
-04 
0.8705 2.96 
0 
10
-04 
0.8753 0.0079 
Fig. 9. Regression plot of model 1 (X – actual residual Al, Y – predicted residual Al). 
Fig. 10. Regression plot of model 2 (X – actual residual Al, Y – predicted residual Al). 
844 C. D. Jayaweera and N. Aziz
Both models demonstrate reasonable distribution. Data distribution in the plot of 
model 1 (Fig. 9) is closer to the y = x line, agreeing with test results in Table 6. 
The plot of RW color vs. residual Al as predicted by model 1 for high turbidity 
water is given in Fig. 11. 
The effect of color on residual aluminum was investigated while other variables 
were ?xed at constant values. Figure 11 is similar to the plot in Fig. 5. However, the 
numerical accuracy is not yet improved. 
The variation of residual aluminum with RW turbidity is shown in Fig. 12. 
Fig. 11. Variation of residual aluminum with RW color as predicted by model 1. 
Fig. 12. Variation of residual aluminum with RW turbidity as predicted by model 2. 
Development of ELM-RBF Neural Network Models 845
Figure 12 is similar in shape to Fig. 6, with constant residual aluminum content at a 
wider range of RW turbidity at initial stages. No improvement in numerical accuracy 
could be observed. 
The plots of the variation of residual aluminum with alum dosage are shown in 
Figs. 13 and 14. 
It could be noted that the expected shape of the curve for the residual aluminum 
versus alum dosage has improved. However the numerical accuracy remains poor. The 
plots demonstrate the increase in residual aluminum with alum dosage in the initial 
stage, the minimum reached on treating turbidity and color, and the proceeding 
increasing trend in residual aluminum with alum dosage, which were not noted in 
Figs. 7 and 8. Therefore usage of a wider range of the dominating variable (RW 
turbidity) appears to have improved the model’s ability to capture reported chemical 
and physical phenomena. 
Fig. 13. Variation of residual aluminum with alum dosage as predicted by model 1. 
Fig. 14. Variation of residual aluminum with alum dosage as predicted by model 2. 
846 C. D. Jayaweera and N. Aziz
4 Conclusion 
Extreme learning machine radial basis function (ELM-RBF) neural network models 
predicting residual aluminum concentration were developed with satisfactory perfor-mance. 
All models had correlation coef?cients exceeding 0.85. A parametric analysis 
was carried out to test the models’ ability to capture reported physical and chemical 
phenomena relating to the response of residual aluminum to changes in alum dosage, 
raw water (RW) turbidity and Natural organic matter (NOM). The models were able to 
demonstrate expected trends of the parametric plots, although the numerical accuracy 
was poor. It was observed that raw water turbidity was the dominating variable 
affecting the multivariate data distribution. Therefore two sets of models were devel-oped 
based on ranges of values of raw water turbidity. The parametric plots improved 
when data with a wider range of values for raw water turbidity was used. Models 
developed in this study could be further improved by optimizing the radial basis layer 
by reducing the degree of randomness using established methods such as genetic 
algorithms. 
Acknowledgment. The cooperation of Sabah Water Supply Department and LDWS for sup-plying 
Segama Water Treatment Plant data is greatly acknowledged. 
References 
1. Leardi, R.: Nature-Inspired Methods in Chemometrics. Elsevier, Amsterdam (2003) 
2. Kennedy, M., Gandomi, A., Miller, C.: Coagulation modeling using arti?cial neural 
networks to predict both turbidity and DOM-PARAFAC component removal. J. Environ. 
Chem. Eng. 3(4), 2829–2838 (2015) 
3. Kim, C., Parnichkun, M.: MLP, ANFIS, and GRNN based real-time coagulant dosage 
determination and accuracy comparison using full-scale data of a water treatment plant. 
J. Water Supply: Res. Technol.-Aqua 66(1), 49–61 (2016) 
4. Valentin, F.N.: An hybrid neural network based system for optimization of coagulant dosing 
in a water treatment plant. Citeseerx.ist.psu.edu (1999). http://citeseerx.ist.psu.edu/viewdoc/ 
citations;jsessionid=81E01F677A156AF2CEDCB2C7CEB14ACE?doi=10.1.1.46.7239 
5. Grif?ths, K., Andrews, R.: The application of arti?cial neural networks for the optimization 
of coagulant dosage. Water Sci. Technol. Water Supply 11(5), 605 (2011) 
6. Joo, D.: The effects of data preprocessing in the determination of coagulant dosing rate. 
Water Res. 34(13), 3295–3302 (2000) 
7. Wu, G., Lo, S.: Effects of data normalization and inherent-factor on decision of optimal 
coagulant dosage in water treatment by arti?cial neural network. Expert Syst. Appl. 37(7), 
4974–4983 (2010) 
8. Zangooei, H., Delnavaz, M., Asadollahfardi, G.: Prediction of coagulation and flocculation 
processes using ANN models and fuzzy regression. Water Sci. Technol. 74(6), 1296–1311 
(2016) 
9. Robenson, A., Shukor, S., Aziz, N.: Development of process inverse neural network model 
to determine the required alum dosage at segama water treatment plant sabah, Malaysia 
(2009) 
Development of ELM-RBF Neural Network Models 847
10. Wu, G., Lo, S.: Predicting real-time coagulant dosage in water treatment by arti?cial neural 
networks and adaptive network-based fuzzy inference system. Eng. Appl. Artif. Intell. 21(8), 
1189–1195 (2008) 
11. Heddam, S., Bermad, A., Dechemi, N.: Applications of radial-basis function and generalized 
regression neural networks for modeling of coagulant dosage in a drinking water-treatment 
plant: comparative study. J. Environ. Eng. 137(12), 1209–1214 (2011) 
12. Maier, H.: Use of arti?cial neural networks for predicting optimal alum doses and treated 
water quality parameters. Environ. Model Softw. 19(5), 485–494 (2004) 
13. Yang, Z., Gao, B., Yue, Q.: Coagulation performance and residual aluminum speciation of 
Al2(SO4)3 and polyaluminum chloride (PAC) in yellow river water treatment. Chem. Eng. 
J. 165(1), 122–132 (2010) 
14. Tomperi, J., Pelo, M., Leiviskä, K.: Predicting the residual aluminum level in water 
treatment process. Drink. Water Eng. Sci. (2013) 
15. Gregor, J., Nokes, C., Fenton, E.: Optimising natural organic matter removal from low 
turbidity waters by controlled pH adjustment of aluminium coagulation. Water Res. 31(12), 
2949–2958 (1997) 
16. Extreme learning machine: RBF network case - IEEE Conference Publication (2004). 
Ieeexplore.ieee.org http://ieeexplore.ieee.org/document/1468985/ 
17. Volk, C.: Impact of enhanced and optimized coagulation on removal of organic matter and 
its biodegradable fraction in drinking water. Water Res. 34(12), 3247–3257 (2000) 
18. Yan, M., Wang, D., Ni, J., Qu, J., Ni, W., Van Leeuwen, J.: Natural organic matter 
(NOM) removal in a typical North-China water plant by enhanced coagulation: targets and 
techniques. Sep. Purif. Technol. 68(3), 320–327 (2009) 
848 C. D. Jayaweera and N. Aziz
Multi-layer Mangrove Species Identi?cation 
Fenddy Kong Mohd Ali? Kong, Mohd Azam Osman(?) , 
Wan Mohd Nazmee Wan Zainon, and Abdullah Zawawi Talib 
School of Computer Sciences, Universiti Sains Malaysia (USM), 11800 Pulau, Pinang, Malaysia 
fkong.ucom11@student.usm.my, {azam,nazmee,azht}@usm.my 
Abstract. One of the challenges encountered by visitors while visiting and 
exploring a mangrove park is to identify the mangrove species and instantly 
retrieve its related information. This paper presents a mobile-based method for 
mangrove species identi?cation called multi-layer mangrove species identi?ca- 
tion method (MSIA) and its application for Kilim Geo-forest Park, in Langkawi, 
Malaysia. This work involves formulating the identi?cation method, the design 
and development of the mobile application, and its integration with the Mangrove 
Reference Data Centre. The application is designed with a user-friendly interface 
to support visitors with limited knowledge on the mangrove tree species. One of 
the main features of MSIA is the automatic identi?cation of the mangrove tree 
species based on its leaf which enhances the performance of species identi?cation. 
Firstly, the mangrove tree leaf is captured using the mobile phone camera. Next, 
the ?rst layer in the identi?cation method is carried out to identify the leaf shape. 
Once the species has been identi?ed, additional parameters such as types of tree 
root, tree bark, ?ower or fruit will be entered in the second layer of the identi?- 
cation process. A speci?c mangrove species common name and its related infor- 
mation such as biological information and possible medicinal or commercial 
worth of the species will then be displayed. This paper presents the design and 
implementation of MSIA as well as the testing and the evaluation of the method 
and its application. It is believed that MSIA would make a visit to the mangrove 
park more meaningful and enjoying to the visitors. 
Keywords: Arti?cial intelligence · Image processing 
Multi-Layer identi?cation · Mangrove species 
1 Introduction 
Many types of mangrove species can be found in many areas in Malaysia such as in the 
Kilim Karst Geo-forest Park in Langkawi. The park is made up of several elongated 
hills and islands with narrow valleys in between, and these valleys are home to one of 
the best and unique mangrove forest in the world. In 2007, UNESCO o?cially declared 
the Langkawi archipelago as one of 94 globally recognized Geo-parks to be endorsed 
for its natural beauty, ecological harmony, and archeological, geological and cultural 
signi?cance [1]. One of the challenges encountered by visitors while visiting the 
mangrove park is to identify the mangrove species and acquire its related information. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 849–855, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_63
This paper presents the design and implementation of a mobile application and the multi-layer 
mangrove species identi?cation method (MSIA) for visitors visiting the park, and 
its integration with the Mangrove Reference Data Centre. Image processing and arti?cial 
intelligence techniques are utilized to enhance the precision of mangrove tree species 
identi?cation and a multi-layer approach is adopted. The application is designed with a 
user-friendly interface to support visitors with limited knowledge on the mangrove. This 
work is part of the Sustainability and Productivity of Mangrove Ecosystem project in 
collaboration with the Centre for Research Initiatives of Universiti Sains Malaysia 
(USM), USM School of Computer Sciences and USM School of Biological Sciences. 
2 Background and Related Work 
Image processing techniques have been widely used in plant monitoring and identi?- 
cation. Their applications include detection of plant diseases, plant species identi?cation 
and crop growth monitoring. In plant species identi?cation, the solutions range from 
methods that are technology independence to methods that are technology dependence. 
In an e?ort to identify mangrove tree species, we will focus on solution involving the 
leaf of the plant. Technology independence methods are methods used by the farmers 
by carrying out physical inspection on a leaf. This type method has been the most 
common method practiced ever since farming started. 
In technology dependence methods, various methods and approaches in identifying 
the leaf include Leafsnap [2], WAPSI [3] and many more [4–9]. Leafsnap [2] is amongst 
the ?rst mobile app for identifying plant species using automatic visual recognition. It 
identi?es the tree species based on the photographs of the tree leaves. The key component 
of the application is the computer vision techniques for discarding non-leaf images, 
segmenting the leaf from an un-textured background, extracting features representing 
the curvature of the leaf’s contour over multiple scales, and identifying the species from 
a dataset of 184 trees in the North-Eastern region of the United States. The application 
obtains state-of-the-art performance on the real-world images from the new Leafsnap 
dataset, which they consider as the largest of its kind. 
Web Application for Plant Species Identi?cation (WAPSI) [3] incorporates content-based 
image retrieval (CBIR). At the heart of this application is a shape-based leaf image 
retrieval system, which uses a contour descriptor based on the curvature of the leaf 
contour that reduces the number of points for shape representation. A two-step algorithm 
for retrieving information is used. Firstly, it reduces the search in the database by using 
some geometrical features. Secondly, leaf images are ranked using a similarity measure 
between the contour representations. The similarity function is applied on di?erent 
images to calculate the distance between the characteristic points’ vectors by using a 
variant triangular membership function. This function is crucial in producing good 
results in the plant species identi?cation. The e?ectiveness of the method was experi- 
mented on their Web-based application. 
On the other hand, there are several mobile apps that can help users to identify ?owers 
such as Audubon Wild?owers, Flower Pedia and What Flower [10]. However, these 
applications only work as a reference guide with no capability to automatically recognize 
850 F. K. M. A. Kong et al.
and identify the ?ower from an input image. Table 1 shows a summary of features of 
the existing solutions for plant species identi?cation. 
Table 1. Comparison of existing plant species identi?cation applications 
Criteria Leafsnap WAPSI Audubon Wild?owers/Flower 
Pedia/What Flower 
Application Category Mobile App Web Mobile App 
Hardware Smartphone Desktop Smartphone 
Client-server Architecture Yes Yes No 
Own Dataset Yes Yes Yes 
Image-Based Identi?cation Yes Yes No 
Prompting of Additional Parameters No No No 
Automated Detection No No No 
The distinct feature of MSIA compared to the existing solutions is the multi-layer 
mangrove identi?cation process, which incorporates an image of mangrove tree leaf as 
the ?rst layer input and then prompts additional parameters for the second layer of the 
identi?cation process. In addition, MSIA provides an automated detection method for 
plant species identi?cation. 
One of the main features of MSIA is to automatically identify the mangrove tree 
species based on its leaf by utilizing image processing and arti?cial intelligence tech- 
niques. There are several existing applications on plant identi?cation and classi?cation 
that uses these techniques. In the method by Aitkenhead et al. [8], the image is rescaled, 
compressed and then saved in 24-bit BMP format. A variety of transformations on the 
RGB values are performed in an attempt to ?nd the most e?ective method so that the 
system can work on di?erent levels of light condition. A method of highlighting green 
from red and blue under all possible light conditions is needed due to the fact that the 
system is required to work in di?erent levels of light condition. Thus, the three RGB 
values are added together for each pixel, and then the green value is divided. In many 
cases, a large proportion of image area is bare soil. In order to reduce the processing 
time and optimizing the system’s ability to recognize di?erent plants, only images 
containing the vegetation are selected. Thus, an image detection algorithm is imple- 
mented to isolate the areas of interest within each image. This algorithm operates by 
performing six steps on each image. Later, their work was extended by Sathya Bama 
et al. [5] to work on image interpretation by using morphology and neural network. 
Another method is the leaf image retrieval using combined features by Wang et al. 
[6]. This application involves two stages of image retrieval process. The ?rst stage is to 
identify the eccentricity of an object using shape feature method. Eccentricity which can 
be roughly used to classify leaf images is used because of its simplicity and usefulness 
as well as its ability to perform translation, scaling and rotation invariants easily. The 
second stage of the image retrieval process involves combining three-feature sets which 
include eccentricity to retrieve leaf image information. 
Multi-layer Mangrove Species Identi?cation 851
3 Design of Multi-layer Mangrove Species Recognition 
Multi-layer Mangrove Species Identi?cation (MSIA) is proposed as a tool to assist visi- 
tors in acquiring information of the mangrove tree instantly. MSIA is implemented in a 
mobile app that is designed and developed for identifying mangrove tree species. The 
method utilizes image processing and arti?cial intelligence techniques with multi-layer 
recognition in identifying mangrove tree species. It operates based on client-server 
architecture that requires an application to be installed in a mobile device for capturing 
an image and then sending the captured image for processing at the server side. Then, 
it will display the result on smartphone’s screen. 
The application has two main modules namely Mobile Module and Server Module 
as shown in Fig. 1. The Mobile Module consists of Image Capturing Module and GPS 
Capturing Module. The GPS Capturing Module is the back-end process running on the 
mobile phone. Its duty include image capturing and retrieving the GPS coordinates of 
the current location of the mobile phone, and later these information will be sent to the 
server side for image processing. 
Fig. 1. Module diagram of MSIA application. 
Synchronizing Data Module is responsible for data connection and integration 
between Mobile Module and Server Module in sending and receiving data from or to 
the server as well as updating data when network is available whereas Server Module 
consists of three modules which are Image Processing Module, Arti?cial Intelligence 
Module and Data Updating Module. The Image Processing Module is the ?rst layer 
identi?cation process which involves a complex process and requires high memory 
usage, and thus this module will be deployed on the server. The Arti?cial Intelligence 
Module receives the result from Image Processing Module as the input. This module 
acts as the second layer of the identi?cation process which will further determine the 
targeted species by prompting additional parameters from the user e.g., tree’s bark color, 
fruit or ?ower. Then, the module will identify the result of possible speci?c mangrove 
852 F. K. M. A. Kong et al.
species based on the Mangrove Reference Data Centre, and ?nally the result will be sent 
to the Mobile Module. 
Figure 2 shows the process ?ow of the multi-layer mangrove species identi?cation 
process. Firstly, the mangrove tree leaf is captured. Next, the ?rst layer (Phase 1) of the 
process of identifying the leaf shape will be carried out which involves three initial steps, 
namely, image pre-processing, leaf shape detection and image classi?cation. The color 
of the mangrove leaves is usually green. However, the various shades and a variety of 
changes in water, nutrient and atmosphere might result in di?erent levels of green in the 
color of the leaves. Thus, the color feature is not very reliable and not suitable to be used 
in the leaf recognition process. Therefore, in image-preprocessing step, grey-scaling is 
applied to the image and this is followed by initial segmentation to exclude the image 
background or non-leaf portion of the image. 
Fig. 2. Simpli?ed ?ow of the multi-layer mangrove species identi?cation method. 
Subsequently, leaf shape detection is applied using SURF descriptor algorithm to 
?nd the range of points that will classify the leaf shape into either elliptic shape, oblong-elliptical 
shape or round shape. After identifying the leaf shape, the classi?cation process 
for possible mangrove species will be carried out by comparing it with the available 
dataset. Once the possible species has been identi?ed, additional parameters such as 
types of tree root, tree bark, ?ower or fruit will be entered as the second layer of the 
identi?cation process (Phase 2 in Fig. 2). A speci?c mangrove species common name 
and its related information from the Mangrove Reference Data Centre such as biological 
information and possible medicinal or commercial worth of the species will then be 
Multi-layer Mangrove Species Identi?cation 853
displayed. If no single species is found, the application will display a list of possible 
species to the user. 
4 Testing and Evaluation 
Testing activities are important in the implementation and deployment activities of the 
proposed method and MSIA application. It acts as a validation process and a process to 
examine a component, sub-module, module or system in order to determine its opera- 
tional characteristics, and validate on whether it has any defect or fallacy. Testing is 
divided into three parts which are unit testing, integration testing and system testing. 
Unit testing was performed on every module throughout the development before each 
module is integrated with another module. After unit testing, integration testing was 
conducted within closely inter-related module. The ease of use and smoothness of user 
interfaces are evaluated in integration testing as the apps itself is targeted largely to non-
IT savvy visitors. The elements of Human Computer Interaction are emphasized by 
providing su?cient error messages and guidelines in the apps. 
From 7th to 9th May 2016, the system was tested on site at Kilim Geo-forest Park. 
The outcome was recorded on a video. In brief, we managed to test two speci?c species 
accurately which are Rhizophora Mucronata and Avicennia Marina as shown in 
Table 2. Capturing of the image of a leaf and entering correct parameter has given correct 
identi?cation. This is very important as the feedback from system testing will re?ect on 
how well the MSIA application has achieved its purpose. 
Table 2. Test cases for system testing 
Species Avicennia Marina Rhizophora Mucronata 
Family Avicenniaceae Rhizophoraceae 
Leaf Elliptic, 8.0 cm long × 3.0 cm wide Broadly elliptic to oblong 
Root Pneumatophore, pencil-like. Stilt roots 
Bark Reddish brown, papery scaly. Dark brown with cracks or horizontal 
?ssures. 
Fruit Ovoid, to 1.3 cm long, with a short 
pointed apex 
Capsule 
Common name Api-Api jambu Bakau belukap, Bakau jangkar, 
Bakau kurap, Belukap 
5 Discussion and Conclusion 
MSIA application which was developed as a tool to assist visitors at Kilim Geo-forest 
Park will bring more meaningful and enjoying visit to the visitors. Based on the evalu- 
ation, the application is able to e?ectively identify the mangrove species based on the 
captured image. The application has its own uniqueness as it involves a multi-layer 
identi?cation process. 
854 F. K. M. A. Kong et al.
There are certain limitations and challenges due to external factors that we could not 
avoid. Image capturing is limited to only healthy leaves, and the distance between the 
camera and the leaf when capturing the image must be less than one meter in order to 
ful?l the requirement of at least 60% of the 640 × 640 pixels image is captured with no 
overlapping of the leaf image with other leaves or objects. 
Storing and Updating Data Module can be introduced to receive data from the 
Synchronizing Data Module and store them in a database. Data consisting of the GPS 
coordinates, image captured and additional parameters that are entered by users could 
later be utilized by any interested party or researchers for various purposes such as in 
planning, sustaining and maintaining the mangrove Park. 
References 
1. Biodiversity Informatics Research Group. Langkawi Mangrove Biodiversity Database 
(2009). http://www.mangrove.my/page.php? 
2. Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., Kress, W.J., Lopez, I.C., Soares, 
J.V.B.: Leafsnap: a computer vision system for automatic plant species identi?cation. In: 
Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. 
LNCS, vol. 7573, pp. 502–516. Springer, Heidelberg (2012) 
3. Caballero, C., Aranda, M.C.: WAPSI: web application for plant species identi?cation using 
fuzzy image retrieval. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., 
Matarazzo, B., Yager, R.R. (eds.) IPMU 2012. Communications in Computer and Information 
Science, vol. 297, pp. 250–259. Springer, Berlin, Heidelberg (2012) 
4. Wang, Z., Chi, Z., Feng, D.: Shape based leaf image retrieval. IEE Proc. Vis. Image Signal 
Process. 150(1), 34–43 (2003) 
5. Sathya Bama, B., Mohana Valli, S., Raju, S., Abhai Kumar, V.: Content based leaf image 
retrieval (CBLIR) using shape, color and texture features. Indian J. Comput. Sci. Eng. 2(2), 
202–211 (2011) 
6. Wang, X.-F., Du, J.-X., Zhang, G.-J.: Recognition of leaf images based on shape features 
using a hypersphere classi?er. In: Proceedings of International Conference on Intelligent 
Computing 2005, LNCS, vol. 3644, pp. 250–259. Springer, Heidelberg (2005) 
7. Kaneko, T., Saitoh, T.: Automatic recognition of wild ?owers. In: Proceedings of 
International Conference on Pattern Recognition, vol. 02, pp. 2507. IEEE (2000) 
8. Aitkenhead, M.J., Dalgetty, I.A., Mullins, C.E., McDonald, A.J.S., Strachan, N.J.C.: Weed 
and crop discrimination using image analysis and arti?cial intelligence methods. Comput. 
Electron. Agric. 39(3), 157–171 (2003) 
9. Gu, X., Du, J.-X., Wang, X.-F.: Leaf recognition based on the combination of wavelet 
transform and Gaussian interpolation. In: Proceedings of International Conference on 
Intelligent Computing, LNCS, vol. 3644, pp. 253–262. Springer, Heidelberg (2005) 
10. Fortegra homepage. http://www.protectcell.com/News/How-to-use-your-phone-to-identify-spring-
?owers.aspx. Accessed 17 May 2017 
Multi-layer Mangrove Species Identi?cation 855
Intelligent Seating System with Haptic Feedback 
for Active Health Support 
Peter Gust1 , Sebastian P. Kampa1(?) , Nico Feller2 , Max Vom Stein1 , 
Ines Haase2 , and Valerio Virzi2 
1 
Bergische Universität Wuppertal, Gaußstraße 20, 42119 Wuppertal, Germany 
kampa@uni-wuppertal.de 
2 
Technische Hochschule Köln, Betzdorfer Straße 2, 50679 Cologne, Germany 
Abstract. Due to infrequent change in posture, static sitting leads to muscular 
tension and even possible degeneration of the intervertebral discs and is therefore 
one of the main causes of serious physical complaints of the back. This sitting 
behavior can be observed particularly at seated workplaces such as o?ce work 
or vehicle guidance in transport or long-distance tra?c. Previous ergonomic 
seating systems have manually or actuator-operated adjustment mechanisms and 
in some cases a movable seat mechanism. Postural support is adjusted once and 
remains unchanged in di?erent sitting positions. This leads to a lack of – or 
incorrect support of – body posture and thus to rapid fatigue of the muscles and 
intervertebral discs. A new approach for ergonomic seating systems is the intro- 
duction of haptic feedback through automatic and prospective actuator deforma- 
tion within the seat surface dependent on the user’s individual sitting position and 
behavior. Haptic feedback is provided by a composite of a sensor that determines 
the distribution of compressive force and an actuator based on a shape memory 
alloy. If several units are used in di?erent zones of the seating furniture, the sitting 
position can be determined and evaluated in real time and the seat can react intel- 
ligently. If the user exceeds the permissible retention time within a position, 
change of sitting position is stimulated by a load-based actuation of the actuators. 
The discomfort, barely perceptible to the user, leads to dynamic sitting and thus 
actively helps to reduce muscular tension and maintain performance over a longer 
period of time. This paper is a draft modular technology concept for the promotion 
of dynamic body posture in any seating system. 
Keywords: Ergonomics of sitting · Best posture · Posture balancing 
Stress-Oriented posture support · Modular system · Prospective design 
1 Introduction 
The strain on the intervertebral discs and muscles when sitting depends on the sitting 
position and seating furniture. Even small changes in sitting position can have consid- 
erable impact on the stresses that occur in the body [1]. The term static posture is used 
in this context of a position that is maintained unchanged over a period of time without 
movement and with concomitant muscular strain [6]. Static working postures and poor 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 856–873, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_64
sitting positions are generally associated with musculoskeletal discomfort and disor- 
ders [5]. 
In order to avoid muscle strain, seated workplaces must therefore be designed in 
such a way that the user moves body and limbs at regular intervals [3]. The design of 
the seat should also encourage spontaneous changes of posture [6]. This is important, 
because only with constant movement can ?uid transport, and hence also nutrient supply, 
be maintained in the intervertebral discs [4]. Frequent change in body position, with 
relevant movement of the limbs or other parts of the body in relation to each other or in 
relation to a ?xed object, is called dynamic body posture [6]. 
People at seated workplaces su?er just as frequently from back pain as do those who 
perform heavy physical work, although the strains on their system are di?erent [2]. 
Approximately 25% of sickness reports are due to complaints of the musculoskeletal 
system [7]. For this reason, measures must be taken to reduce the strain on the muscles 
and intervertebral discs. 
This need must also be seen in the context of New Work, with its ever-increasing 
tendency towards decentralization and individualization [13]. Working time is, with 
increasing frequency, decoupled from a speci?c workplace [13], which means there is 
no longer only one ?xed work chair, but several at di?erent stationary, mobile – or at 
some point in the future even autonomously driven – workplaces; all the more reason 
for the promotion of user-tailored dynamic posture. 
Norms and standards describing the essential design parameters of seats are de?ned 
for selected work equipment [6, 7]. DIN EN ISO 9241-5, for example, lists four main 
features that signi?cantly promote a dynamic posture at computer workstations [6]. 
These features, then, describe minimum requirements for the con?guration of o?ce 
chairs [7]: 
• seat surface inclination 
• movement of seat and backrest 
• rollers 
• rotatability. 
When designing the workplace, it must be ensured that the user can set and change 
the sitting position at all times [6]. 
In addition, backrests with individually adjustable lumbar supports are used to 
promote natural sitting posture in the middle lumbar region [6]. This is necessary 
because in an upright posture the spinal column has a double S-shaped course in the 
sagittal plane, with the cervical and lumbar spine curved convexly forward (lordosis) 
and the chest and sacral spine curved concavely backward (kyphosis) [6, 9]. Although 
this form also remains in a seated position, a tendency towards kyphosis can be seen 
after a long time, especially in the lumbar spine [9]. This tendency can be counteracted 
by restoring natural posture by means of lumbar support and thus relieving the inter- 
vertebral discs [9]. 
In addition to computer workstations, this form of support is indispensable in auto- 
mobiles, again because of the poor posture their sitting position encourages. This is 
illustrated in Fig. 1. On the left is a standing person with normal curvature of the spine, 
Intelligent Seating System with Haptic Feedback for Active Health Support 857
in the middle a vehicle driver without lumbar support, and on the right a driver whose 
spinal posture is corrected with lumbar support. 
Fig. 1. Comparison of the skeleton in standing and sitting position with and without lumbar 
support [9]. 
Individually adjustable armrests also relieve neck and shoulder muscles during work 
interruptions or static tasks such as driving a vehicle, and thus prevent muscle pain [6, 
10]. Like lumbar support, armrests are also used in many areas and workplaces. 
However, some design parameters of static workstations cannot be transferred to 
mobile workstations, e.g. in individual or long-distance tra?c. This applies above all to 
the dynamic parameters, such as the castors and rotatability of the seat. Since position 
change is not intended in static work tasks like driving, other measures must be taken 
to stimulate dynamic sitting. 
Car manufacturers use actuators in the backrest and seat that can be activated indi- 
vidually. For example, the BMW 7 series uses 18 pneumatic lifting elements that can 
be activated in a massage and rotation cycle in accordance with medical-physiothera- 
peutic parameters. The arrangement of the actuators within the seat is shown in Fig. 2. 
While the massage function promotes blood circulation in the back muscles and relieves 
tension, the rotation function slightly twists the occupant’s body, thus lightly stressing 
and relieving the intervertebral discs. This promotes the supply of nutrients and thus the 
regeneration of the intervertebral discs [8]. 
Another approach, also from BMW in model X5, is the use of a two-part seat that 
promotes dynamic posture by lifting and sinking the seat alternately [10]. This is shown 
in Fig. 3 below. Other car manufacturers use similar concepts to support dynamic 
posture. 
858 P. Gust et al.
Fig. 2. Rear seat of a BMW 7 series [8] Fig. 3. Comfort/Active Seat of a BMW X5 
Series [11] 
All in all, it can be seen that the problem of static posture and resultant complaints 
is well-known, and that active countermeasures are already being taken to prevent pain 
and physical damage caused by incorrect sitting position and lack of exercise. 
2 Haptic Feedback System with Posture Detection 
The Introduction to this article has shown that promoting dynamic posture is an impor- 
tant aspect of workplace design. This is done in two ways: 
On the one hand, passive elements such as individually adjustable back and armrests 
and lumbar supports are used to relieve the body and ensure good postural support. On 
the other hand, active – e.g. pneumatic lifting – elements are used in vehicle seats to 
promote blood circulation to the muscles and to reduce strain on the intervertebral discs 
by actively stimulating postural change. 
However, these solutions have one thing in common: The actual posture of the user 
and the number of position changes carried out independently of the active system are 
not recorded. On the contrary, the system runs prede?ned cycles whose type and inten- 
sity can be selected by the user. Regardless of actual body size and mass, the user is 
guided into positions that are not determined independently. 
Figure 4 shows this relationship. Existing systems start from an unknown posture 
and guide the user into a de?ned position with a de?ned impulse. Within discrete inter- 
vals this de?ned position is continuously changed to another de?ned position. 
Intelligent Seating System with Haptic Feedback for Active Health Support 859
t = x t = x 
t = x 
undefined 
posture 
defined 
impulse 
defined 
posture 
Fig. 4. Current functionality [own illustration]. 
The new approach presented here is to combine an actuator with a sensor in order 
to determine actual posture and hence to be able to output load and posture-dependent 
impulses. The aim is to e?ect subconscious perception of these impulses by the user and 
to initiate an independent change of posture as a result. 
Di?erent sitting postures and length of stay within any one position will be deter- 
mined by a distribution of actuators adapted to body stature. The measured data will be 
recorded and in this way the individual posture of the user will be detected. Based on 
data acquired over time, the user will be brought into a dynamic posture by means of 
intelligently controlled and unconsciously perceived impulses. By connecting the 
system to a smartphone, it is also possible to provide user recognition, which means that 
sitting positions can be recorded and settings transferred independently of the seat. This 
is shown in Fig. 5. 
dependent 
change 
undefined t << x t = x 
posture 
defined 
posture 
verify 
posture 
independent 
change 
t < x 
unnoticed 
impulse 
t > x 
Fig. 5. New functionality [own illustration]. 
3 A Haptic Feedback System Approach 
3.1 Force-Displacement Measurement Chair 
The required force and displacement to be applied by the actuator are determined on a 
modi?ed o?ce chair. This has a device for measuring force as a function of the displace- 
ment. The test setup is essentially a modi?ed measuring system based on Vink and Lips 
[12], who determined the sensitivity of the human body at 32 di?erent points on the 
contact surface between body and chair. The cohort of 23 test persons in the study already 
shows a clear tendency to pinpoint certain zones as more sensitive than others. The 
?ndings of Vink and Lips are presented in Fig. 6 below [12]. 
860 P. Gust et al.
Fig. 6. Areas with signi?cantly di?erent sensitivities [12]. 
Table 1 shows the forces determined by Vink and Lips at which test persons felt 
clear discomfort. It becomes clear that the area adjacent to the backrest on the seat has 
the lowest comparative sensitivity [12]. For subsequent measurement of force-displace- 
ment, for which maximum required values are determined, this range is therefore to be 
regarded as the reference zone. 
Table 1. Average values at which discomfort is experienced [12] 
Position Sensitivity – pressure in N (with standard deviation) 
Backrest 
10.92 (5.30) 12.17 (5.94) 15.17 (7.20) 12.61 (6.86) 10.72 (6.68) 
11.81 (8.30) 21.57 (16.02) 13.10 (7.99) 
14.38 (9.44) 19.24 (13.10) 15.36 (9.73) 
15.29 (6.59) 21.01 (8.67) 17.18 (8.67) 
Seat pan 
22.39 (9.33) 23.42 (11.09) 23.21 (7.87) 22.69 (9.40) 
23.50 (9.33) 22.39 (10.17) 21.35 (8.01) 22.64 (8.88) 
19.76 (8.63) 15.73 (6.23) 16.61 (6.45) 18.83 (7.28) 
Figure 7 shows the test setup of the force-displacement measuring device. Compres- 
sive force is applied through a 2.5 sq cm stamp connected to a device measuring the 
force through the seat on the test persons. Vertical measurement is ensured by a linear 
guide connected to a displacement sensor. This test setup ensures that measurements 
and forces are always applied in the same way. Exact positioning of the test person on 
the seat is not possible due to di?erent individual anthropometric characteristics. 
However, this is not necessary, as sitting habits are in any case di?erent. 
Intelligent Seating System with Haptic Feedback for Active Health Support 861
Cushion 
Seat Pan 
Piston 
Displacement 
Force Sensor 
Sensor 
Linear Actuator 
Degree of Freedom 
Fig. 7. Test con?guration for force-displacement measurement [own illustration]. 
For the experiments, 12 test subjects – 4 women and 8 men – were used, which 
con?rmed the trend of Vink and Lips’s results and at the same time extended these by 
indicating the displacement. Table 2 shows the results as measured: 
Table 2. Results of force-displacement measurement in the rear right area of the seat 
Weight Height Force 
Perception Discomfort Perception Discomfort 
Average 78.33 kg 16.92 mm 23.42 mm 24.75 N 47.75 N 
Standard 
deviation 
14.62 kg 7.54 mm 7.95 mm 16.72 N 21.19 N 
What is noticeable is that the perception threshold and perceptible discomfort have 
shifted upwards. This is due to a slightly di?erent test setup in which a seat cushion is 
still mounted on the seat. The cushion absorbs part of the force as well as part of the 
displacement, resulting in higher readings than Vink and Lips’s measurements. Further- 
more, the discomfort threshold for women is about 50% higher than for men, both in 
strength of the force and distance of the displacement. This makes a second di?erence 
from the results of Vink and Lips. 
However, since the actuator is to be installed in upholstered seats, the measured 
values for the project appear realistic. For discomfort to be generated below the 
conscious perception threshold, a required force of 30 N and a stroke of 20 mm is there- 
fore assumed. 
3.2 Actuator Design 
The actuator has the task of applying a force or impulse to the human body. This impulse 
must be initiated in such a way as to cause discomfort that is unconsciously perceived 
by the user and leads to the voluntary adoption of a di?erent sitting position. To create 
862 P. Gust et al.
this discomfort, the force must be applied very slowly. The adjustment speeds required 
for this must, then, be determined in tests. 
In addition, noise must be avoided, as this can be interpreted by the user as an indi- 
cator of the actuator’s response. The actuator must also be small enough to be installed 
in the restricted space of the seat and backrest. 
Starting from the data determined in 3.1, the actuator can now be designed. To ensure 
smooth and lubricant-free guidance, the actuator is constructed on the basis of a linear 
guide with sliding surfaces made of tribologically optimized polymers [14]. The drive 
is a Bowden cable with an electrically operated shape memory alloy (SMA) wire. These 
wires can be integrated into small spaces and provide smooth, stepless, silent, and very 
strong linear tensile force. However, the wire must be long enough for the required 
displacement. 
The actual application of force on the user should then take place through a polymer 
leaf spring. The tractive force of the SMA wire causes the leaf spring to bulge and the 
impulse is then transmitted to the user. Figure 8 shows the schematic structure of such 
an actuator. 
Fig. 8. Design scheme of an actuator based on a shape-memory alloy wire [own illustration]. 
3.3 Sensors 
Capacitive sensors are used to record posture and sitting behavior. The capacitive sensor 
system used in the experiments cost only about 0.5% of conventional measuring systems 
for localization and quanti?cation of seat force distribution – as used, for example, in 
the design of new car seats. It was important to select an a?ordable, space- and energy-saving 
system that provided su?cient accuracy with measuring the available seating 
positions and could cope with changing climatic conditions, shocks and impacts. 
Intelligent Seating System with Haptic Feedback for Active Health Support 863
4 System Evaluation 
4.1 Actuator Test Bench 
In order to verify the concept, a prototype of the actuator using a linear guide was created 
on the basis of the design from Fig. 8. This is shown in Fig. 9 below. 
Fig. 9. Actuator prototype for force-measuring test [own illustration]. 
Di?erent materials and material thicknesses were used for the leaf springs to verify 
the general feasibility of the system. Within the test, only vertical forces were initially 
determined on a test bench using a force measuring device (Fig. 10). 
Fig. 10. Actuator test bench [own illustration]. 
The principle of the SMA-actuator precluded the use of electrically conductive 
materials without electrical decoupling, as these would cause a short circuit. For this 
reason, di?erent plastics and their properties were tested. The materials used were ABS, 
PC, PET, PMMA and PVC in di?erent material thicknesses. The vertical force was 
measured at a distance of 12 mm above the initial position in conjunction with the hori- 
zontal force and resultant distance of movement necessary for the deformation of the 
864 P. Gust et al.
material. Measurement results are shown in Fig. 11. The x axis shows horizontal force 
input and the y axis vertical force output. 
0 
5 
10 
15 
20 
25 
30 
35 
40 
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 
Vertical Force [N] 
Horizontal Force [N] 
ABS 1,5 mm 
PET 1,0 mm 
ABS 1,0 mm 
PC 0,5mm 
Fig. 11. Measurement results of tested material. 
Since the tested dimensions of PMMA and PVC were overly large, the graphs are 
not shown in the diagram. However, the generated vertical force was proportional in 
each case to yield strength and material thickness. Furthermore, with increasing sti?ness, 
increasing force was required to achieve the required displacement. This is indicated by 
the shift of the graphs on the x axis. These ?ndings enabled the leaf spring to be opti- 
mized. 
4.2 FEM Optimization of the Actuator 
FEM-based optimization is performed to determine a suitable geometry for the leaf 
springs, with the aim of decreasing horizontal force and displacement and increasing 
vertical actuator force. The optimization is carried out with ANSYS1 . The geometry is 
based on a shell element with material thickness de?ned as a parameter in ANSYS. In 
order to reduce computing time, a symmetrical system is assumed and only one side is 
1 
ANSYS, Inc.: https://www.ansys.com. 
Fig. 12. FEM model structure. 
Intelligent Seating System with Haptic Feedback for Active Health Support 865
simulated. To measure the resultant vertical force a solid, weightless cylinder is attached 
with a rotational spring on top of the leaf spring. Given a constant spring rate, the contact 
force can be calculated. The model structure is shown in Fig. 12. 
The shape of the leaf spring is based on a shifted catenary curve (1, [cf. 18]), which 
makes this shape particularly suitable for static loads – as can be seen from its use in 
load-bearing structures such as bridges. 
f (x) = -) 
[ 
a *) cosh 
( 
x -) 
x0 
a 
) 
-0 a 
] 
+ a *0 cosh 
( 
-x0 
a 
) 
-x a 
(1) 
The parameters are the width of the plastic actuator, and the material thickness and 
height of the function depending on the yield point. In order to reach a de?ection of 
20 mm and a vertical force of 30 N it is necessary to maximize the curvature factor a 
and the width of the leaf spring and to minimize the thickness of the plastic spring. This 
correlation can be seen in Fig. 13 below. 
Fig. 13. Response surfaces; Left: Width (x), Curvature factor (y) and Deformation (z); Right: 
Width (x), Thickness (y) and Deformation (z). 
The durability of the actuator depends on the materials used. For safety reasons, the 
use of materials that are liable to break is avoided: only ABS, PC and PET are considered 
suitable. Hence the equivalent tensile stress (von Mises) is determined with di?erent 
material parameters. Optimization results show that PC and PET have similar properties 
at a material thickness of 1 mm, but the thermal dimensional stability of PET is lower. 
The stress for ABS, on the other hand, is overly high. Due to the signi?cant heating of 
the SMA actuator, polycarbonate is therefore preferred. 
4.3 Sensor Test Chair 
The positioning of the sensors in the seat and backrest follows Nicol [15]. Figure 14 
shows the distribution of pressure in the backrest and seat in an upright sitting position, 
866 P. Gust et al.
followed by the iliac crest and lordosis. From Nicol’s study it can be concluded that the 
distribution of forces is clearly visible across the pressure peak in the seat and backrest 
[15]. Further investigation will be carried out to con?rm this. 
Fig. 14. Pressure distribution in an o?ce chair: vertical sitting position [Revised based on Nicol, 
cf. 15]. 
Based on studies by Nicol [15], it can be assumed that the distribution of forces on 
the seat surface is clearly visible across the pressure peaks of the buttocks and thighs. 
This makes it possible to identify the sitting position by means of four sensors arranged 
around the center of gravity of the seat surface, on the basis of moments around the x 
and y axes. To integrate the sensors into the seat, they are mounted on a holder placed 
between the seat and the seat surface holder. The test setup is shown in Fig. 15. 
Fig. 15. Device for measuring pressure distribution in the seat surface [own illustration]. 
On the basis of optimum pressure distribution of the back (Fig. 16, illustration on 
right) the arrangement of the sensors is transferred to the dimensions of a backrest [16]. 
It becomes apparent that measurement resolution must be increased for clear determi- 
nation of di?erent sitting position in comparison to the seat surface. This is achieved by 
using six sensors, as shown in Fig. 16 (left-hand illustration). The advantage of this 
arrangement is that inclination of the upper body to one side can be determined. 
Intelligent Seating System with Haptic Feedback for Active Health Support 867
Fig. 16. Left: Arrangement of capacitive sensors [own illustration]; Right: Optimal pressure 
distribution of the back [16]. 
The sensors are read out using a microcontroller and PSoC Designer2 and 
programmed in programming language C. In addition, an algorithm is developed that 
uses sensor data to determine the position of the user. The algorithm is based on 14 
characteristic sitting postures, divided into six load classes. Figure 17 shows the sitting 
postures and their classi?cation. 
The measured values required to detect the respective position are determined in a 
test. The test persons take up the di?erent sitting positions and the sensor data are 
recorded. By activating the sensors in di?erent combinations, all 14 sitting positions can 
be clearly determined. 
The classi?cation is based on biomechanical observations using an EMG measure- 
ment system and a pain scale for self-assessment by the subjects. The following illus- 
tration (Fig. 18) shows the clear correlation between muscle stress, pressure distribution 
of the seat surface and contact points of the back surface for two selected seat positions 
or seat load classes. 
2 
Cypress Semiconductor Corp.: http://www.cypress.com/products/psoc-designer. 
868 P. Gust et al.
Fig. 17. Classi?cation of posture in load classes from low (class 1) to high (class 6) [17]. 
Intelligent Seating System with Haptic Feedback for Active Health Support 869
Fig. 18. Relationship between stress and pressure distribution [own illustration]. 
4.4 Software 
Capacitive sensors are installed on the backrest and seat to determine the users’ position. 
A microcontroller processes the data received from the capacitive sensors and senses 
whether the user is present on a speci?c sensor or not, and how high the strain on a 
particular part of the body is. The presence of the user or lack thereof on di?erent sensors 
corresponds to a speci?c sitting position. Every position has unique characteristics 
within its pro?le, and these can be identi?ed by the algorithm. The key principles of the 
algorithm are shown in the following pseudo-code (Fig. 19) for the position multitasking 
and lean lateral: 
Fig. 19. Pseudo-code [own illustration]. 
Firstly, the sensor data is stored in respective variables. The Identi?cation function 
receives the sensor data and determines the sitting position depending on which sensors 
are activated. If none of the sensors in the backrest is activated and the strain on the front 
left sensor is higher than the mean of the other seat sensors, the algorithm has the output 
“multitasking”. 
870 P. Gust et al.
Accordingly, there is a maximum time a user should sit in a speci?c position. An 
algorithm recording sitting time for each position is based on the principle of react on 
change. After every change of position, the total time is counted and subtracted from 
the total time passed. In this way the total time sitting in a speci?c position is measured 
and can be evaluated by the user. When the maximum time for a particular position is 
reached, the system enables the actuator; this provokes gentle discomfort in the user that 
encourages transit to a di?erent position. 
The collected data is transmitted via Bluetooth or via a serial port to other devices 
such as cell phones or other microcontrollers. The data consists of users’ sitting position, 
duration in this position, processed sensor data, and the command to take action. The 
generated data is used by the App to create an individual sitting pro?le. 
5 Results 
It has been shown that there is a signi?cant need for modular and active seating systems 
that promote dynamic posture at both stationary and mobile – and even autonomously 
mobile – workplaces. The aim is to achieve continuous load distribution within the 
muscles and spine in order to avoid tension and to ensure a constant ?ow of nutrients 
to, and within, the intervertebral discs. New Work envisions a shift in working behavior 
from the classic workplace in a centralized company to a decentralized model in which 
work is carried out at di?erent locations [13]. This makes it necessary to integrate a 
modular active system into almost every seat. 
In addition to existing standards that de?ne the minimum requirements for seating 
systems, there are already some approaches, especially in the automotive industry, to 
promote a dynamic posture. These approaches range from individual adjustment options 
for the seat surfaces and backrests, through armrests and lumbar supports, to actuator 
adjustment of individual areas within the contact surfaces between user and seat. 
However, previous systems have one thing in common: the changed seating position is 
not determined by the user, but by predetermined cycles. Rather than record the actual 
sitting position of the user, conventional systems initiate user-independent impulses that 
actively cause a change in the sitting position. 
The aim of the present research project was, therefore, to develop a modular actuator 
that determines the individual sitting posture of the user, continuously monitors it, and 
emits unnoticed impulses to change that posture. Using a test bench procedure, the 
project determined the necessary forces and displacements for unconsciously perceived 
discomfort of test persons. The required parameters, 30 N and 20 mm stroke, were 
comparable with the ?ndings of Vink and Lips [12]. The function of the actuator, based 
on a plastic leaf spring with SMA drive, was also veri?ed in the tests. 
Since the shape of the leaf spring required long adjustment paths in order to achieve 
the required stroke, shape optimization was carried out using FEM analysis. It could be 
shown that a catenary curve had good shape properties for this application, enabling the 
displacement and force applied by the drive to be signi?cantly reduced, thus improving 
e?ciency. 
Intelligent Seating System with Haptic Feedback for Active Health Support 871
The determination of the seating position with capacitive sensors also provided good 
information. The arrangement of the sensors in the seat and backrest in accordance with 
Nicol [15] allowed the detection of 14 characteristic sitting postures, which could be 
divided into six load classes. 
6 Conclusion 
Since the results of the research project were entirely positive, the next step is to connect 
the sensors with the actuators and combine them in a single module. A possible concept 
is shown in Fig. 20. The spring actuator is made of a bent plastic disc which is performed 
in accordance with the speci?c simulation. The actuator spring will be placed in two 
linear guided sledges via a fast mounting click-?xture system. Movement heights can 
be adjusted in di?erent seats by varying the attack angle or shape of the spring. 
Fig. 20. Haptic feedback system with posture detection in automotive seat [own illustration]. 
The possible positioning of the actuators is illustrated in Fig. 20, using the example 
of an automobile seat. Here the distribution of the sensors from Figs. 15 and 16 is taken 
over and extended by the actuators in the seat – illustrated in the diagram by green spots 
and circles on the seat system. A possible smartphone representation of measurement 
data feedback is also illustrated. 
In future data collection in the cloud will facilitate learning from many di?erent users 
with speci?c pro?les. Seats in di?erent vehicles could also eventually be connected and 
adjusted to any user of an ‘intelligent’ chair. 
References 
1. Wilke, H.-J., et al.: Intradiscal pressure together with anthropometric data – a data set for the 
validation of models. Clin. Biomech. 16, 111–126 (2001) 
872 P. Gust et al.
2. Nachemson, A.: Towards a better understanding of low-back pain: a review of the mechanics 
of the lumbar disc. Rheumatol. Rehabil. 14(3), 129–143 (1975) 
3. DIN EN 1335-1: O?ce furniture – O?ce work chair – Part 1: Dimensions, Determination of 
dimensions. Beuth Verlag, Berlin (2002) 
4. Wilke, H.-J., et al.: New in vivo measurements of pressures in the intervertebral disc in daily 
life. SPINE 24(8), 755–762 (1999) 
5. Graf, M., Guggenbühl, U., Krueger, H.: An assessment of seated activity and postures at ?ve 
workplaces. Int. J. Ind. Ergon. 15(2), 81–90 (1995) 
6. DIN EN ISO 9241-5: Ergonomic requirements for o?ce work with visual display terminals 
(VDTs) – Part 5: Workstation layout and postural requirements. Beuth Verlag, Berlin (1999) 
7. Lenkeit, M.: Ergonomie. Richtiges Sitzen am Arbeitsplatz reduziert körperliche Belastung. 
In: MM MaschinenMarkt, vol. 39, p. 36 (2008) 
8. BMW Group: Der neue BMW 7er. Entwicklung und Technik. 1st edn. Vieweg + Teubner, 
Wiesbaden (2009) 
9. Grünen, R.E., Günzkofer, F., Bubb, H.: Anatomische und anthropometrische Eigenschaften 
des Fahrers. In: Bubb, H. et al. (ed.) Automobilergonomie. Springer Vieweg, Wiesbaden 
(2015) 
10. Bubb, H., Grünen, R.E., Remlinger, W.: Anthropomentrische Fahrzeuggestalgung. In: Bubb, 
H., et al. (ed.) Automobilergonomie. Springer Vieweg, Wiesbaden (2015) 
11. BMW AG Homepage. https://www.bmw.de/de/topics/service-zubehoer/original-bmw-zubehoer/
original-bmw-zubehoer-showroom/interieur/sitze/komfort-_aktivsitze0.html? 
bmw=sea:59424387:1788890067:bmw%20komfortsitze. Accessed 7 April 2018 
12. Vink, P., Lips, D.: Sensitivity of the human back and buttocks: the missing link in comfort 
seat design. Appl. Ergon. 58, 287–292 (2017) 
13. Hackl, B., et al.: New Work: Auf dem Weg zur neuen Arbeitswelt. Springer Gabler, 
Wiesbaden (2017) 
14. igus® GmbH Homepage. https://www.igus.de/drylin/linearfuehrung. Accessed 11 April 
2018 
15. Nicol, K.: http://nicol-biomechanik.de/doku.php?id=sitzenbei. Accessed 12 April 2018 
16. Hartung, J.: Objektivierung des statischen Sitzkomforts auf Fahrzeugsitzen durch 
Kontaktkräfte zwischen Mensch und Sitz, unv. Diss., Technische Universität München 
(2005) 
17. Feller, N., et al.: Prospective design of seating systems for digitalized working worlds. In: 
Goonetilleke, R.S., Karwowski, W. (eds.) Advances in Physical Ergonomics and Human 
Factors, pp. 98–105. Springer, Cham (2017) 
18. Wohlhart, K.: Statik. In: Grundlagen und Beispiele. Vieweg, Wiesbaden (1998) 
Intelligent Seating System with Haptic Feedback for Active Health Support 873
Intelligence in Embedded Systems: 
Overview and Applications 
Paul D. Rosero-Montalvo1,2(B) , Vivian F. L´ opez Batista1 , Edwin A. Rosero2 , 
Edgar D. Jaramillo2 , Jorge A. Caraguay2 , Jos´e Pijal-Rojas3 , 
and D. H. Pelu?o-Ord´ o˜ nez4,5 
1 
Departamento Inform´atica y Autom´ atica, Universidad de Salamanca, 
Salamanca, Spain 
pdrosero@utn.edu.ec 
2 
Universidad T´ecnica del Norte, Ibarra, Ecuador 
3 
Intituto Tecnol´ ogico Superior 17 de Julio, Ibarra, Ecuador 
4 
Yachay Tech, Urcuqu´i, Ecuador 
5 
Corporaci´on Universitaria Aut´onoma de Narino, ˜ Pasto, Colombia 
Abstract. The use of electronic systems and devices has become widely 
spread and is reaching several ?elds as well as indispensable for many 
daily activities. Such systems and devices (here termed embedded sys-tems) 
are aiming at improving human beings’ quality of life. To do so, 
they typically acquire users’ data to adjust themselves to di?erent needs 
and environments in an adequate fashion. Consequently, they are con-nected 
to data networks to share this information and ?nd elements 
that allow them to make the appropriate decisions. Then, for practi-cal 
usage, their computational capabilities should be optimized to avoid 
issues such as: resources saturation (mainly memory and battery). In 
this line, machine learning o?ers a wide range of techniques and tools 
to incorporate “intelligence” into embedded systems, enabling them to 
make decisions by themselves. This paper reviews di?erent data stor-age 
techniques along with machine learning algorithms for embedded 
systems. Its main focus is on techniques and applications (with special 
interest in Internet of Things) reported in literature about data analysis 
criteria to make decisions. 
Keywords: Decision making 
·
Embedded systems 
Internet of things 
·
Machine learning 
1 Introduction 
Micro-controllers are electronic systems that work as the central processor unit 
(CPU) of an electronic system, their main function is to acquire data from digi-tal 
or analogous pins, perform information processing and generate some actions 
through output peripherals. The existence of open-source hardware boards has 
made developments in electronic applications have considerably increased, reach-ing 
lower-cost approaches and providing a platform of free access with large 
.f
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 874–883, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_65
Intelligence in Embedded Systems 875 
amount of information [1,2]. For instance, the AVR micro-controllers are incor-porated 
in Arduino. Thanks to its type 8-bit reduced instruction set computing 
(RISC) architecture along with its processing memory that can be 16 bits, the 
most of its instructions can be performed in a clock cycle to have up to 32 
work records, being this feature, one of its biggest bene?ts [3]. Such micro-controllers 
are provided with features such as: hardware and software interrup-tions, 
timers, communications ports and sleep modes for energy savings. Origi-nally, 
the Arduino boards were provided with the ATmega 168 micro-controller. 
More recent releases like the new Arduino distributions include the ATmega 
328 micro-controller featuring reduced size and better capabilities mentioned 
previously [4]. 
In recent years, new technologies like Systems On a Chip (SOC) has driven 
applications with scaling of embedded systems. This allows the integration of 
several microprocessors by increasing stacked cores and improving their process-ing 
capacity [3]. Besides, SOC uses 3-D Silicon systems, that means that it are 
an advanced packaging which prevents physical damage and corrosion [5]. This 
has allowed that di?erent technologies can allows to improve machine learning 
capabilities, to take data in real time to process it faster and to determine the 
appropriate sampling cycles of the sensors [6]. In virtue of the above commented, 
an embedded system (ES) is de?ned as an electronic system, usually forming part 
of a bigger system device that speci?cally designed to perform certain functions 
[7]. Its main characteristic is to use one or several digital processors (CPUs) per 
microprocessor, which allows the system helps to major control and gain some 
“intelligence” in tasks such as processing information generated by sensors, con-trolling 
certain actuators, communicate with others systems, among others. In 
the design of an embedded system, specialized engineers and technicians are 
usually involved in both electronic hardware and software design [6]. The core of 
such module is made by at least one CPU among others: 4, 8, 16 or 32 bits [8]. In 
general, for the design of an ES, they has limited resources, the amount of mem-ory 
will be scarce, the computation capacity and the number of external devices 
will be limited [9]. Regarding the software, there will be speci?c requirements 
according to the application and storage system size. 
Currently, the ES more used is wireless sensor networks (WSN) for its appli-cation 
?exibility, they are which has been recognized as the most emerging and 
interesting technology for the development of the Internet of Things. This has 
allowed to increase its popularity in industrial and academic research [5]. The 
new WSN products are driving the next wave of exponential growth of systems 
in education, acting as sensor nodes in the development of a range of applica-tions 
[7]. 
The rest of this paper is structured as follows: Sect. 2 presents the Internet 
of Things and their impact in ES. Section 3 shows in a more broad way to WSN. 
Section 4 presents the new protocols used by embedded systems. The intelligence 
for ES is showed in Sect. 5, how machine learning algorithms. Section 6 indicates 
the most important ES applications. Finally, Sect. 7 presents the conclusions and 
remarks of this work.
876 P. D. Rosero-Montalvo et al. 
2 Internet of Things 
The Internet of Things (IoT) is a shared intelligent network connecting di?erent 
types of electronic systems between themselves by sending data to the Internet 
through communication protocols. Of this way, millions of connected devices 
from an integral and reliable global infrastructure are part of information soci-ety 
With intelligent processing it allows people to acquire information to make 
decisions [10]. Researchers like [11], explains that the Internet of Things is about 
to transform our cities become in smart cities with the cooperation of di?erent 
sectors to achieve sustainable results through data analysis. This include, in 
addition to sensors, a correct data extraction and processing methodology to 
improve the quality of life of all human beings. To do so, it is necessary to make 
economies of scale through investment in infrastructure that allows the develop-ment, 
management, monitoring, performance analysis and remote diagnosis to 
perform a predictive analysis of large amounts of datasets. All this information 
?ow generates 20 exabytes of information with 25 billion devices. It is expected 
that by 2020 there will be 50 billion devices connected to the Internet, that is 
6.58 devices per person [12,13]. 
The standardized protocol to communicate with the Internet is Transmission 
Control Protocol (TCP)/IP, which can handle two forms of communication for 
one hand, is TCP, it creates connections with each other which guarantees the 
delivery of data without errors and without order. If there is a failure the protocol 
informs the transmitter to send the information again. Connection between a 
transmitter and a receiver allows a speci?c port to send reliable data. For the 
other hand is User datagram protocol (UDP), it is a data transmission protocol 
that does not need to establish a previous connection, it is a communication 
known as maximum e?ort, which means that the key part of this protocol is 
to send information to the network as soon as possible. This protocol does not 
acknowledge if data completely arrived or not [14]. The protocols are not suitable 
for IoT. For this reason, researchers was determined ways to make them light [15]. 
3 Protocols 
The ES needs light transmission protocols of computational cost. 6LoWPAN is 
a standard that you have entered the notion of ES and wireless sensor networks 
based on the transmission of IPV6 packets over IEEE 802.15.4 networks [16]. 
The appearance of these networks makes it necessary to implement security 
mechanisms [15]. The 6LoWPAN protocol stack includes the standard IEEE 
802.15.4 MAC layer and IEEE 802.15.4 physical layer, the IP layer adopts the 
IETF IPV6 protocol. Thus allowing interconnection between networks [16]. RPL 
routing, also known as the network layer protocol, is a distance vector routing 
protocol for low power networks, using IPv6. The network devices that run 
the RPL protocol connect without present cycles. Proposed by IETF for IPv6 
routing, RPL is designed for networks with high packet loss and low power loss 
rates [14]. The objective of RPL is to target networks that “comprise up to
Intelligence in Embedded Systems 877 
thousands of nodes”, where most nodes have very limited resources, the network 
is directed by a central node. Multi point to point, point to multi point and 
point to point tra?c are included [17]. 
When an embedded system needs to connect to the IoT, the most commonly 
used standards are Message Queue Telemetry Transport (MQTT) and restricted 
application protocol CoaP. MQTT is a machine-to-machine communication pro-tocol 
(M2M). It is useful for connections to remote locations where a small code 
space and/or scarce network bandwidth is required. The (CoAP) is a specialized 
web transfer protocol for use with restricted nodes and restricted networks in 
the Internet of Things. The protocol is designed for machine-to-machine (M2M) 
applications such as smart power and building automation [14,15]. The two 
protocols are used very frequently, their di?erence lies in the di?erent ?elds of 
applications and the necessary latency of information. 
4 Wireless Sensor Network 
In the beginning WSN was de?ned as an ad-hoc network formed by a large col-lection 
of very simple devices that combine detection, computation and commu-nication 
capacity, among others [16]. For 2010 WSN was proposed as a standard 
protocol where the nodes communicated to a base station and in this the data 
was analyzed, without extending from this area [16]. 
With the arrival of the Internet of Things, inconveniences happened since 
protocol lacked ?exibility and compatibility with applications since they require 
modi?cations to the protocols, a very commercial solution is the proposed 6LoW-
PAN standard as a viable method to carry IPV6 through WSN [16]. Unlike other 
networks, wireless sensor networks have the feature of collecting detected data 
(temperature, pressure, movement, ?re detection, voltage/current, etc.) and for-warding 
it to the gateway through a protocol one-way communication. Sensor 
nodes in WSN can also be considered as a collection of low cost, low power 
consumption, and multi-functional wireless sensor nodes [17]. Also, it allowing 
to detect data events of unprocessed sensors in scattered nodes. WSN allows to 
carry out tasks of supervision, monitoring and distributed control to make quick 
decisions with good performance with e?ective high transmission rates to avoid 
excessive energy consumption to avoid failed transmissions. They are networks 
are hundreds or thousands of nodes that can solve problems distributive commu-nicating 
with each other. The protocol of access to the medium (MAC) is simple 
and lightweight at low cost, the control of access to the environment improves the 
performance to guarantee the performance of successful transmissions improving 
battery life [18]. 
5 Machine Learning 
The increasing use and applications of embedded systems have made it possi-ble 
to collect large amounts of data to provide necessary information in real 
time to make appropriate decisions by people. In this sense, the veracity of the
878 P. D. Rosero-Montalvo et al. 
data takes an important role in the implementation of decision support systems. 
Unfortunately, the data can be in?uenced by uncontrolled variables, such as 
environmental factors that impair the validity and usability of them, the data 
acquisition noise, voltage unbalance, sensors are a?ected by faults, transient or 
permanent, for mention among the most important considerations. Also, the 
execution time and memory restrictions, and the use of algorithms and parame-ters 
learned from data, introduce additional levels of uncertainty that a?ect the 
accuracy of the decision-making algorithm. For these reasons, machine learning 
(ML) can be provide methods to obtain knowledge of a set of data where people 
can not do it because of the quantity and complexity of the information. With 
the appearance of Big data, these techniques have become important to discern 
few important data and discard what that is not useful to a mathematical model 
of prediction or classi?cation [19]. 
ML has two divisions in groups for data modeling, these are: unsupervised 
and supervised. In supervised learning, it creates models based on a historical 
data entry to ?nd a certain output, with these algorithms can be divided into 
predict or classify [18]. The unsupervised techniques are related to the data that 
only have input or the corresponding variables are unknown to ?nd correlations 
between the data that were not known, they can be divided into problems of 
data grouping (clusters) or the association between them [20]. A ML algorithm 
must have model training and validation data to know the degree of accuracy of 
the classi?cation or prediction in relation to the real data [21]. 
The machine learning algorithms need a pre-processing of data where you 
can preserve the useful information that represents all the characteristics of a 
high-dimensional data set [20], that is, a matrix with many variables that the 
human being can not appreciate their relationship [21]. Many ML techniques 
reduce the data to a less variables of its nominal version to be intelligible to 
the people use. In this article, is mentioned the most useful criteria for data 
representation. 
5.1 Data Cleaning 
The number of data for the training of an algorithm can have a high computa-tional 
cost, where many of them do not contribute to the statistical model. The 
appropriate selection of the training set allows to increase the e?ectiveness of 
the algorithm, reduce the computational load and the appropriate selection of 
the number of data to be used [22]. Data cleaning tries to eliminate the most 
amount of data of the training set. 
5.2 Pattern Recognition 
At WSN, di?erent variables can be acquire of the state of an event, this data can 
be obtained with matrix of high dimensionality. Of this way, it is presumable 
that there is redundancy or excessive data by the system’s speed reading. A mul-tivariate 
?lter means that all variables have an importance or weight equivalent 
that can represent a data set with less weight. In model-based methods, there
Intelligence in Embedded Systems 879 
are one or several dependent or independent variables [23]. The quality of the 
adjusted model determines which of them are the most signi?cant for the algo-rithm. 
The reduction of dimensionality allows to reduce the size of the matrix 
of use for supervised or unsupervised models by eliminating attributes that may 
be irrelevant or redundant in relation to the objective to be achieved. Allowing 
to improve the quality of the model by focusing on appropriate correlations and 
expressing an algorithm with fewer variables that can be better visualized by the 
human [23]. Association is a concept and correlation are a measure of association 
[24]. This terms are very useful for the algorithms choose the correct data set. 
5.3 Dimensionality Reduction 
The volume of data increases the di?culty for the detection of patterns and tech-niques 
of machine learning, one way to deal with this problem is the presentation 
of data in a smaller dimension that maintains the structure of the original space 
[24]. Initially the techniques of dimensionality reduction were based on linear 
methods, being simple and rigid that did not always represent a set of data, 
most variables and their relationships have a complex behavior. 
Non-linear models allow such detection of patterns of nature. The methods 
of dimensionality reduction (DR) are oriented to the preservation of the data 
topology represented in an a?nity matrix [25]. DR methods are able to simplify 
the description of the data set that can represent large volumes of information 
at optimal processing times, while keeping the same properties of the complex 
high-dimensional data. As a result, it favors compression, elimination of redun-dancy 
and improves the processes with the implementation of machine learning 
algorithms. 
5.4 Prototype Selection 
The original capabilities of most Data Mining techniques have been exceeded 
by the deluge of incoming data. However, several techniques try to alleviate 
the drawbacks of using an overwhelming amount of data [21]. The techniques 
of prototype selection (PS) are methods of data preprocessing whose objective 
is to reduce the set of training to generate better representative examples and 
improve the rules of the nearest neighbor. 
Many selection methods exist in the literature with di?erent properties, these 
techniques can be classi?ed into two di?erent approaches, known as prototype 
selection (choosing a subset of the original training data) and prototype genera-tion 
(new arti?cial prototypes) [19]. Therefore, its goal is to isolate the smallest 
set of instances that allows a data mining algorithm to predict the class of a query 
instance with the same quality as the initial data set. By minimizing the size of 
the data set, it is possible to reduce the complexity of the space and decrease 
the computational cost of the data mining algorithms that will be applied later, 
improving their generalization capabilities by eliminating noise [22].
880 P. D. Rosero-Montalvo et al. 
5.5 Classification Algorithms 
The classi?cation is one of the most studied topics in the machine learning 
techniques, the reason for this is that there are a lot of problems in di?erent 
areas such as security, medicine or ?nance that need to classify many of the 
data who drive. The objective of the classi?ers is to build a model or classi?er 
from a set of already classi?ed examples that allows classifying new examples 
not previously seen in the future [26]. These tasks, are very related with ES, that 
means that the most popular ES applications needs classi?cations algorithms. 
The supervised classi?cation problem are divided into two main phases, the 
following are detailed: for the one hand, the classi?cation system will use a series 
of examples called training set, this information will already be classi?ed to learn 
from them [25]. Using the training set, you will create a series of rules or decision 
methods to correctly classify the training examples [27]. To another hand, the 
classi?cation algorithm needs a test set. This data are useful to determinate 
the system performance. Are many ways to test the system, the most used is 
confusion matrix [27]. 
6 Applications 
Embedded systems has many applications for their easy installation and acquire 
data. Of this way, technologies like this, it becomes the beginning of IoT. 
6.1 Farming 
Inside in agriculture, the implementation of sensors in crops is proposed to know 
the environmental and land conditions to compare weather conditions and deter-mine 
the amount of water, fungicides, nutrients, among others. Many agricultural 
areas can improve their e?ciency by determining harmful weeds for plants and 
animals with programmed robots that are responsible for the precise elimina-tion 
of the same. Livestock can be monitored to know the areas where they are 
and give warning when one has been lost, also with 3D accelerometers can detect 
physical problems and share this information among farmers to analyze patterns 
of diseases [16]. 
6.2 Smart Buildings 
The buildings collect information by sensors of light, heat, movement, among 
others. By interpreting the data, Es can save the consumption of electrical energy 
and the generation of algorithms to learn the behaviors of people in physical 
spaces [17].
Intelligence in Embedded Systems 881 
6.3 Education 
An ICAMPUS is to revolutionize the practice of teaching with an ecosystem 
of knowledge with the ES use. Projects such as Living Labs that are environ-ments 
that unite people with technology to promote innovation, development 
and research with a view to putting these advances into a curriculum in schools 
and colleges In order to provide tools for a changing world that improves skills 
in the global economy, the Smart-boxes (modi?able ES to teach basic electronic) 
allow to generate a learning by applying high technology that the student has 
di?erent feedback environments using a persuasive and interactive programming 
through the use of images that allow to re?ect the behavior of an electronic sys-tem 
and can link a programming code with real life generating driving conditions 
in di?erent electronic devices [7]. 
6.4 Transport 
Transport, one goal is to achieve e?ciency and safety, such as warning the car 
to slow down when a tra?c light changes to yellow or warn of a parking space. 
We must consider that 90% of accidents are human errors. Because of that, a 
smart environment can improve the decisions of a driver based on tra?c data 
or vehicle density. 
In airports, sensors are being installed to know the ?ow of people passing by 
to deploy extra personnel and help with the long lines; all this tra?c of people 
can be observed from an application and can redirect the road with di?erent 
ways to reach their destination [8]. 
6.5 Health 
A global concern is the population growth, in the year 2025 it is estimated that 
there will be around 1200 million elderly people and people over 80 will be 30% 
of this population in developed countries and 12% in countries in the process of 
development. In addition to problems such as obesity and mental illness increase 
social spending. The Internet of Things allows to address the ?elds of prevention 
and early detection, research and health care; since vital signs can be monitored 
to collect a large amount of data and be able to determine if some patterns of 
life can alter their health, with this information the doctors can perform remote 
assistance and allow quick action. This requires a very reliable inter operable 
infrastructure for the acquisition and analysis of data and above all, maintaining 
the con?dentiality of the user. These ES are considerate wearable [17]. 
7 Conclusions 
This work showed about the basic concepts of electronic systems and their trends 
in applications of the future. The IoT and WSN are the next stages of embedded 
systems are present for their portability and low resources. These systems could
882 P. D. Rosero-Montalvo et al. 
not work without e?cient learning algorithms that were shown brie?y. Machine 
learning algorithms in relation to embedded systems take a fundamental part 
in the analysis of data, cleaning techniques, pattern recognition, among others. 
They allow with low computational resources that electronic systems become 
autonomous. 
The future of electronic systems are based on the di?erent applications that 
can help improve the quality of life of people, its major challenges that must be 
continued working are: the durability of the battery, the secure connection to 
the cloud and the management of the devices. In addition, the protocols must 
be even lighter due to the increasing acquisition and data transmission. 
Finally, the algorithms of machine learning and its connection to IoT will be 
part of our normal life, these technologies are the industrial revolution 4.0. As a 
near need, It must have management devices within the WSN that manage the 
amount of data to be sent to the cloud. This in order to avoid the unnecessary 
expense of uploading data that does not provide information. The new term to 
explain it is the fog of IoT. 
References 
1. Parameswaran, S., Wolf, T.: Embedded systems security-an overview. Des. Autom. 
Embed. Syst. 12, 173–183. https://doi.org/10.1007/s10617-008-9027-x 
2. Noergaard T.: Embedded Systems Architecture. Chemistry, p. 657. https://doi. 
org/10.1016/B978-0-12-382196-6.00006 
3. Kadionik, P.: Introduction to Embedded Systems. Communicating Embedded Sys-tems 
(2013). https://doi.org/10.1002/9781118557624.ch1 
4. Levy, M., Conte, T.M.: Embedded multicore processors and systems. IEEE Micro, 
7–9 (2009). https://doi.org/10.1109/MM.2009.41 
5. Toulson, R., Wilmshurst, T.: Embedded Systems, Microcontrollers, and ARM 
(2017). https://doi.org/10.1016/B978-0-08-100880-5.00001-3 
6. Gu, C.: Building Embedded Systems. O’Reilly & Associates (2016). https://doi. 
org/10.1007/978-1-4842-1919-5 
7. Edwards, S., Lavagno, L., Lee, E.A., Sangiovanni-Vincentelli, A.: Design of embed-ded 
systems: formal models, validation, and synthesis. Proc. IEEE (1997). https:// 
doi.org/10.1109/5.558710 
8. Chin, J., Callaghan, V.: Educational living labs: a novel internet-of-things based 
approach to teaching and research. In: 2013 9th International Conference Intelligent 
Environments (IE), pp. 92–99 (2013) 
9. Alippi, C.: Intelligence for embedded systems. In: Intelligence for Embedded 
Systems: A Methodological Approach (2014). https://doi.org/10.1007/978-3-319- 
05278-6 
10. Kortuuem, G., Keynes, M., Bandara, A.: Educating the internet of things genera-tion. 
Computer, 53–61 (2013) 
11. Zhao, G.X., Bei, Q.: Application of the IOT technology in the intelligent manage-ment 
of university multimedia classrooms. Appl. Mech. Mater., 2050–2053 (2014) 
12. Thangavel, D., Ma, X., Valera, A.: Performance evaluation of MQTT and CoAP via 
a common middleware. In: 2014 IEEE Ninth International Conference on Intelligent 
Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 4–6 (2014)
Intelligence in Embedded Systems 883 
13. Alwakeel, S., Alhalabi, B., Aggoune, H., Alwakeel, M.: A machine learning based 
wsn system for autism activity recognition. In: 2015 IEEE 14th International Con-ference 
on Machine Learning and Applications (ICMLA), pp. 771–776 (2015) 
14. Knickerbocker, J., Patel, C., Andry, P., Cornelia, T.: Through-Vias: 3-D silicon 
integration and silicon packging technology using silicon. IEEE J. Solid-State Cir-cuits, 
1718–1725 (2006) 
15. Singh, K.: WSN LEACH based protocols: a structural analysis. In: International 
Conference and Workshop on Computing and Communication (IEMCON). Van-couver, 
BC, pp. 1–7 (2015). https://doi.org/10.1109/IEMCON.2015.7344478 
16. Sudheendran, S., Bouachir, O., Moussa, S., Dahmane, A.O.: Review - challenges of 
mobility aware MAC protocols in WSN. In: Advances in Science and Engineering 
Technology International Conferences (ASET), Dubai, Sharjah, Abu Dhabi, United 
Arab Emirates, pp. 1–6 (2018). https://doi.org/10.1109/ICASET.2018.8376831 
17. Arya, S., Yadav, S.S., Patra, S.K.: WSN assisted modulation detection with max-imum 
likelihood approach, suitable for non-identical Rayleigh channels. In: 2017 
International Conference on Recent Innovations in Signal Processing and Embed-ded 
Systems (RISE), Bhopal, India, pp. 49–54 (2017). https://doi.org/10.1109/ 
RISE.2017.8378123 
18. Khan, A.R., Rakesh, N., Bansal, A., Chaudhary, D.K.: Comparative study of WSN 
protocols (LEACH, PEGASIS and TEEN). In: 2015 Third International Confer-ence 
on Image Information Processing (ICIIP), Waknaghat, pp. 422–427 (2015). 
https://doi.org/10.1109/ICIIP.2015.7414810 
19. Rosero-Montalvo, P., et al.: Prototype reduction algorithms comparison in nearest 
neighbor classi?cation for sensor data: empirical study. In: IEEE Second Ecuador 
Technical Chapters Meeting (ETCM), Salinas, pp. 1–5 (2017). https://doi.org/10. 
1109/ETCM.2017.8247530 
20. Restuccia, F., D’Oro, S., Melodia, T.: Securing the internet of things in the age 
of machine learning and software-de?ned networking. IEEE Internet Things J. 
https://doi.org/10.1109/JIOT.2018.2846040 
21. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neigh-bor 
classi?cation: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. 
Intell. 34(3), 417–435 (2012) 
22. Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance 
selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. 
Comput. 7(6), 561–575 (2003) 
23. Simes, A., Costa, E.: CHC-based algorithms for the dynamic traveling salesman 
problem. In: Applications of Evolutionary Computation: EvoApplications (2011) 
24. Pena-Unigarro, ˜ D.F., et al.: Interactive data visualization using dimensionality 
reduction and dissimilarity-based representations. In: Intelligent Data Engineering 
and Automated Learning–IDEAL 2017, pp. 461–469. https://doi.org/10.1007/978- 
3-319-68935-7 50 
25. Rosero-Montalvo, P.D., et al.: Data visualization using interactive dimensionality 
reduction and improved color-based interaction model. In: Biomedical Applications 
Based on Natural and Arti?cial Computing - IWINAC 2017, pp. 289–298. https:// 
doi.org/10.1007/978-3-319-59773-7 30 
26. Nunez-Go ˜ doy, S., et al.: Human-sitting-pose detection using data classi?cation and 
dimensionality reduction. In: IEEE Ecuador Technical Chapters Meeting (ETCM), 
Guayaquil, pp. 1–5 (2016). https://doi.org/10.1109/ETCM.2016.7750822 
27. Rosero-Montalvo, P.D., et al.: Elderly fall detection using data classi?cation on a 
portable embedded system. In: IEEE Second Ecuador Technical Chapters Meeting 
(ETCM), Salinas, pp. 1–4 (2017). https://doi.org/10.1109/ETCM.2017.8247529
Biometric System Based on Kinect Skeletal, 
Facial and Vocal Features 
Yaron Lavi1 , Dror Birnbaum1 , Or Shabaty1 , 
and Gaddi Blumrosen1,2P(&) 
1 
Tel Aviv University, Tel Aviv 69978, Israel 
gaddi.blumrosen@ibm.com 
2 
IBM Research, Yorktown Heights, NY 10598, USA 
Abstract. Identi?cation of human subject in different environments plays a 
signi?cant role in many ?elds like security and health care. The identi?cation 
can be performed by using different sensory metrics, often named “biometric”. 
Traditional biometric technologies are based mainly on ?ngerprint, retina, voice, 
and face. In this study, the spontaneous use of skeleton, facial, and vocal metrics 
is being investigated. For this, a Microsoft Kinect (“Kinect”) system, which was 
mainly built to estimate human subject kinematic features are deployed. Kinect 
is affordable, non-wearable, and has the potential to assess joints location, voice, 
and facial properties simultaneously. A set of skeletal, facial, and vocal features 
is extracted, and create a “Kinect Signature” that is used to identify different 
subjects in the scene. The methods were veri?ed by a set of four experiments 
simulating common realistic scenarios. The experiments indicate that the 
skeleton, facial, and vocal metrics derived from the Kinect can differentiate 
between different subjects. The results of this work indicate that while skeletal 
based metrics are usually more accessible compared to facial and vocal metrics, 
facial and vocal metrics are more accurate. Aggregation of all data streams 
improves biometric system performance and their continuity in different envi-ronments 
and times. Such systems can be a base for an affordable, accurate real-time 
biometric system, that can be deployed at home, and public facilities like 
hospitals. 
Keywords: BiometricKinectPostureVoice recognitionFace recognition 
1 Introduction 
Reliable person recognition in different environments plays a signi?cant role in services 
where there is need to con?rm or determine the identity of an individual, like security 
[1], or medicine [2]. A person that is authorized by the system, is referred as part of 
white list, and ones that are not, are in the black list [3]. The data collected in the 
process of person identi?cation is called, biometric data. Features based on these 
Y. Lavi, D. Birnbaum and O. Shabaty—Equal contribution. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 884–903, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_66
metrics are used to recognize the desired person by using enhance classi?cation 
methods [4]. 
The identi?cation process can utilize different biometric technologies. Traditional 
Bio-metric technologies are based on eye’s retina [5], ?nger print, voice, face, [6] and 
recently body posture and gait [7]. The sensor modalities can be based on optical 
modalities, like video cameras, electromagnetic measurements like radar, or vocal 
ultrasonic signatures [8]. The sensor can be divided into two main categories: direct 
and indirect biometric systems [4]. In direct biometric systems, the user performs an 
identi?cation procedure, like drowning a ?ngerprint on a screen, or wearing a coded 
marker. In indirect system, the biometric system tries to identify the person without any 
intentional action. This has the advantage of person recognition from a distance, and 
excludes the need from sometimes a tedious procedure. 
Microsoft Kinect™ (Kinect), is an active system originally developed for the 
gaming industry, and has recently gained popularity as a tool to asses human activity 
and for patient monitoring [9]. The Kinect combines an optical video camera to pro-duce 
a real-time stream of color images, an infrared radar technology to produce depth 
images stream, and a voice recorder [10]. The Kinect software processes the images 
from the infra-red radiation reflections, and aggregates together with the related color 
video streams. It can reconstruct skeleton joints parts [11], and key facial points [12]. 
The Kinect’s capability to assess human kinematic data was recently validated 
when compared to an optical marker-based 3D motion analysis [11]. The Kinect 
succeeded in measuring spatial characteristics, ranging from excellent for gross 
movements to very poor for ?ne movement such as hand clasping [9]. Recently, 
skeletal data observed from single camera and training session, was shown to have 
comparable results to Kinect [13]. An algorithm to monitor activities of daily living 
based on skeletal data, has the capacity to detect abnormal events in the home [14]. 
Despite its relatively high accuracy rate and its ability to provide full-body kinematic 
information, the Kinect (versions 1 and 2), still possesses the following de?ciencies 
[21]: (1) limited coverage; (2) distortion of facial and skeleton estimations; and 
(3) when multiple people cross through the Kinect range, or when one person is closer 
than another or hides the other, the current Kinect application begins an automatic re-detection 
process, with different index assignments, which can lead to inaccurate 
interpretation of the data [15]. 
Kinect Based Biometric System can be based on each of the sensor modalities 
independently, or on combination of all [4]. Face Recognition is one of the most 
extensively researched problems in biometrics, and many techniques have been pro-posed 
in the literature. A biometric system based on subjects faces was suggested in 
[12]. The Kinect depth images can improve the facial recognition quality based only on 
2-D color images [16]. A system that combines human metrology face recognition, and 
speaker identi?cation to increase identi?cation performance and range, based on Kinect 
was suggested in [17]. A methodology of recognizing subjects based on their gait 
patterns was suggested in [18]. The Kinect Signature (KS) based on features like 
subject’s size and proportions between different BPs, was used recently to differentiate 
between subjects [19, 20]. It was inspired by sonar [8] and radar [21] signatures, which 
are patterns unique to each person. These KS attributes can be derived in a separate 
calibration phase, or using a priori knowledge about the Subject of Interest (SoI). 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 885
This work suggests using a biometric system based on Kinect facial, skeleton, and 
vocal metrics simultaneously, under varying environment conditions. Tools to identify 
users by each sensing modality separately are derived together with methods to 
aggregate the information from all. The advantage and disadvantage of using each 
sensing modality is further discussed. The feasibility of the new technology is 
demonstrated in an experiment setup with ten adult subjects (6 males, 4 females), in 
extreme environmental conditions, which includes multiple subjects (with occlusion of 
subjects’ body parts), change of clothing, and change in illumination. 
This paper has three-fold contribution: (1) a new set of low dimension features 
based on Kinect skeleton and facial points estimation, and the audio recording; 
(2) methods to identify each biometric separately and together; (3) evaluation of the 
tolerance of the three Bio-metric system to challenging environmental conditions that 
include change of cloth, change of light, and environment with multiple subjects and 
objects. 
This paper is organized as follows. Section 2 describes the methods used in this 
study. Section 3, describes the experimental set-up for evaluation of the technology; 
Sect. 4, the results, and Sect. 5 summarizes the results and suggests directions for 
future research. 
2 Methods 
The three Kinect sensory streams of skeleton, face-points, and audio recording are used 
as biometric data. First features are extracted for each of the biometric data stream, and 
form the KS (Kinect Signature). Existing prior knowledge related to feasible values of 
these features and their statistical distribution is used to tune ?lter parameters, and to 
detect artefactual time instances, where the Kinect based features are distorted. The 
signi?cant and reliable features for subject identi?cation are chosen and used to 
training a classi?er. The trained system can be used for continuous subject identi?- 
cation determination if the person is in to the white list, black list, or unidenti?ed. The 
data analysis stages are summarized in Fig. 1. 
2.1 Kinect Biometric Data 
The Kinect utilizes independent color (RGB) and depth images streams. The color and 
depth images can be aggregated to provide recursive estimations for 3-D joint coor-dinates 
(3-D) [22]: 
^ 
Jm 
¼ 
Lj 
^ 
Jm1 ; Cm ; Dm 
m m 
; 
^ 
Jm 
¼ 
Kjm 
þ xm 
j 
; 
ð1Þ 
where jm is the 3-D joint location of length of 25 (Kinect v2), 
^ 
Jm , and 
^ 
Jm1 are the 
joints’ location estimation at time instance m, and m -n 1, respectively, Cm , and Dm are 
the color and depth images at time instance m, Lj is a function that maximizes the joint 
886 Y. Lavi et al.
matching probability based on a very large database of people [23], and K, and xm 
j 
are 
the skeleton joints’ distortion and noise factors, respectively [24]. 
The image processing is performed independently on each frame using Kinect 
training data set [24]. Detected people, are included in the current active set, which is 
restricted to maximum 6 people [23]. The number of joints in the skeleton varies 
between 20 (Kinect v1) to 25 (Kinect v2) [23]. Still, when one subject hides behind 
another, or moves in and out of the Kinect range and the skeleton might become not 
valid, and a new registration process is initiated for this subject. 
Kinect support three main different facial features: Shape Units (SU), Animation 
Units (equivalent to Action Units), full facial edge points (Facial key points from a 
facial mesh) of size 1300, and partial key points, named Facial Key Points (FKS), of 
size of 5. While AUs are used more for facial expression, and SUs and full HD facial 
points require higher data storage, and longer registration process and line of sight 
conditions, the FKS of size 5 are estimated together with the skeleton after successful 
skeleton assessment [23]. The FKS, are tracked using algorithm like Viola-Jones and 
result in ?ve facial points (of the eyes, nose, and two edges of the mouth), projected on 
the 2-D color image [25]. 
Fig. 1. Kinect based biometric system data flow. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 887
Using the 3-D coordinates related to the 3-D facial points, pm , can be used to 
estimate the 3-D facial point vector as follows: 
^ pm 
¼ 
Lf ^ pm1 ; Cm 
f 
; Dm 
f 
m m 
; 
^ pm 
¼ 
pm 
þ xm 
f 
; 
ð2Þ 
where ^ pm , and ^ pm1 are the 3-D facial points’ location estimation of length ?ve at time 
instance m, and m -n 1, Cm 
f 
, and Dm 
f 
are the color and depth images of the facial area as 
estimated by the Kinect at time instance m, Lf is a function that maximizes the joint 
matching probability, and xm 
f 
is the facial points’ estimation error. 
The Kinect has a high quality audio recording, which can be used for speaker 
recognition [17]. The recorded audio signal at time instance m is de?ned by: 
^ vm 
¼ 
vm 
þ xm 
v 
; 
ð3Þ 
where xm 
v 
is the audio noise due to ampli?ers noise, and analog to digital conversion. 
2.2 Skeleton Based Biometric System 
Skeleton based features can be based on the subject kinematics, asymmetry measures 
[26], or static features, which have the advantage of being invariant to time [20]. This 
work focuses, without loss of generality, on two main skeleton-based static features 
families: length, and ratio between different Body Parts (BPs). Color features of the 
body, are limited to speci?c scene. Still, colors of speci?c BPs like the face, or the 
hands, which are not covered by a cloth, are more likely to be used. Static features, 
have the advantage that their values can be assumed constant over time, and thus can be 
used to identify each subject [27]. 
A partial sum of BPs’ lengths is associated with the subject’s body dimensions, like 
BP’s spread or height, and is de?ned as: 
Lm 
s 
¼ 
X 
i;i0 2I 
D 
^ 
Jm 
i 
m 
^ 
Jm 
i0 
0 0 
; 
ð4Þ 
where the operation D is Euclidean distance metric, I is the full set of joint’s indexes, 
and D 
^ 
Jm 
i 
m 
^ 
Jm 
i0 
0 0 
is the length of the BP between joints 
i0 , and i, which is denoted as 
BPi;i0. 
Another complementary static feature to the BPs’ length is the ratio between BPs. 
The ratio feature at time instance m, can be de?ned as a subset of ratios between a set of 
BPs. For subset of two BPs, the ratio between the 
BPi;i0 
and the 
BPl;l0, 
is de?ned by: 
Rm 
s 
¼ 
D 
^ 
Jm 
i 
m 
^ 
Jm 
i0 
0 0 
D 
^ 
Jm 
l 
m 
^ 
Jm 
l0 
0 0 
ð5Þ 
888 Y. Lavi et al.
The set of static KS’s features in (4) and in (5), form the KS’s skeleton features: 
Fm 
s 
¼ 
fLm 
s 
; Rm 
s 
g ð6Þ 
Kinect based skeleton estimations suffers from distortion when the subject moves 
out of the Kinect effective range and the Kinect uses inaccurate interpolation, or when 
there is an erroneous skeleton merge with nearby subject or object. The temporal 
distortion in the skeleton estimations can leads to temporal artifacts in the features. 
Since the KS’s features are static, they are distributed around their mean value over 
different body postures and positions, and the KS’s features mean can be used as their 
reference value. 
Prior knowledge regard the subjects’ body can be applied. The BP’s length, and 
ratio features, have typical minimal, maximal, and mean values, that can assessed by 
training or by applying known human philological constraints. These vales can be used 
to detect and remove artifacts. Information regard invalid postures [28], subject range 
(if it resides in the optimal coverage), and about the times the joints are interpolated by 
the Kinect [29], can also be used for artifact removal and correction. 
Artifactitious times, when the skeleton estimations are distorted, and as a results 
some of the features, deviate from their expected value, can be detected, and removed. 
A simple measure for the skeleton features quality can be estimated by the deviation of 
the skeleton features using a binary quality measure. The binary quality measure at the 
n’th KS at instance time m, is de?ned as: 
Qm 
s 
¼ 
1 Fm 
s 
; F 
^
s 
m 
m 
m 
m \eAs 
0 else 
A 
A 
A 
A 
A 
A 
A 
A ; 
ð7Þ 
where F 
^
s 
is a the static features estimated without distortion that can be estimated in 
optimal condition, Fm 
s 
; F 
^
s 
m 
m 
m 
m 
is distance metric between Fm 
s 
and F 
bs, 
and eAs is the 
skeleton distortion threshold, which is tuned to maximize the artifact detection prob-ability. 
The low quality measures can be replaced by the KI’s median values F 
^
s; 
or 
interpolated by using physiological constrains on the skeleton as in [30], prior to 
feature selection. 
Feature selection algorithm chooses the set of features that are more robust to 
skeleton joint estimation error, and conceives the most relevant information for subject 
identi?cation. An ef?cient supervised one is Fisher’s linear discriminant classi?er, 
which has computational and statistical scalability [31]. Tree based Classi?cation 
algorithms like Random forest, can be used to select the features in real-time [32]. 
To decide if the subject in a white or black lists, the KS’s skeleton features after 
feature selection Fm 
s 
are matched to patterns stored in the data base [4] using a pattern 
matching algorithm. The results of the matching can include also the con?dence of the 
decision. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 889
An identi?cation criterion to check if the subject belongs to a dataset of length 
N (whitelist or blacklist) at instance time m is: 
^ ns 
¼ 
argmaxnfs Fs;n; Fm 
s 
m m 
s:t: Qm 
s;k 
[ eDv 
ð8Þ 
where fs is the matching pattern function, n is the subject index in the database, 
n 
¼ 
1...N, Fs;n is the stored n’th subject skeleton’s KS in the database, Qm 
s;n 
is the 
con?dence of the pattern matching, and eD is the detection threshold, which is usually 
setup to minimize the false detection probability. In case the maximal similarity con-?dence 
is below detection threshold, ^ ns would be null, indicating the subject is not 
likely to be part of the list. 
2.3 Facial Based Biometric Estimation 
Similar to the skeleton based features, facial features can be derived from the Kinect 
FKSs, and can be separate to static and dynamic features [9]. Dynamic features are 
related to dynamics of the facial expressions, have high variations, and are less suitable 
for subject identi?cation. Static features can be based on: (1) face dimensions 
(FD) features, like distance between facial points-data [37]; (2) face proportions 
(FP) features, like ratio between distances of the facial points-data; and (3) skin color 
consistency. 
The facial KS, based on the FPs in (2), can be then de?ned similar to the skeleton 
joints based KS features in (6), as: 
Fm 
f 
¼ 
fLm 
f 
; Rm 
f 
; Cm 
f 
g ð9Þ 
where Lm 
f 
¼ 
P 
i;i0 2I 
D 
^ 
fm 
i 
m 
^ 
fm 
i0 
0 0 
, Rm 
f 
¼ 
D 
^ 
fm 
i 
m ^ 
fm 
ð 
i 0Þ 
D 
^ 
fm 
l 
m ^ 
fm 
ð 
l 0Þ 
, Cm 
f 
¼ 
C 
^ 
fm 
i 
m m 
, are the length, ratio 
features, and C 
^ 
fm 
i 
m m 
is the pixel values at the FP 
^ 
fm 
i 
at time instance m. 
Prior-knowledge regards the facial dimensions and proportions are used to exclude 
inaccurate estimations. Burst noise, can be ?ltered out by a median ?lter, and a binary 
quality measure at the n’th KS at instance time m, can be de?ned as: 
Qm 
s 
¼ 
1 Fm 
f 
; ^ 
ff 
f 
f 
f 
f 
f 
\eAf 
0 else 
A 
A 
A 
A 
A 
A 
A 
A 
A 
A 
; 
ð10Þ 
where F 
^
f 
is a the static features estimated without distortion, and eAf is the facial 
distortion threshold, which is tuned to maximize the artifact detection probability. 
890 Y. Lavi et al.
Same feature selection algorithm can be applied with the face. An identi?cation 
criterion to check if the subject belongs to a dataset of length N (whitelist or blacklist) 
at instance time m is: 
^ nf 
¼ 
argmaxnff Ff ;n; Fm 
f 
m m 
s:t: Qm 
f ;k 
[ eDf 
ð11Þ 
where ff is the matching pattern function, Ff ;n is the stored n’th subject facial KS in the 
database, Qm 
f ;n 
is the con?dence of the facial pattern matching, and eDf is the detection 
threshold like in (8). 
2.4 Voice Based BioMetric System 
A high-pass ?lter, h, is applied on the audio samples in (3), ^ vm , in a process called pre-emphasis 
process: 
b 
vm 
h 
¼ 
b 
vm 
m 
h 
ð12Þ 
The ?ltered signal is used to produce spectral features, named Cepstral features by 
applying Mel-?lter banks of size 20, in the bandwidth range of 125–3800 Hz, in a 
sliding window of length of 10 ms. The Mel frequency cepstral coef?cients are: 
Cm 
p 
¼ 
Fðb 
vm 
h 
Þ ð13Þ 
Delta and double delta coef?cients were then calculated using a ?ve-frame window 
resulting in a 60-dimensional feature vector [33]: 
Fm 
v 
¼ 
fCm 
p 
; C0m 
p 
; C000 
m 
p 
g ð14Þ 
where 
C0 
m 
p 
, 
C00 
m 
p 
, are the Delta and double delta coef?cients. 
The features dimension can be reduced by using a “i-vector feature extractor” [34], 
scaled down using an LDA matrix, and normalized. These features are fed to a clas-si?er 
using according to: 
^ nv 
¼ 
argmaxnfv Fv;n; Fm 
v 
m m 
s:t: Qm 
v;k 
[ eDv 
ð15Þ 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 891
where fv is the vocal matching pattern function, and Fv;n is the stored n’th subject vocal 
KS’s pattern in the database, Qm 
v;n 
is the con?dence of the vocal pattern matching, and 
eDv is the detection threshold for the vocal recognition. 
2.5 Aggregation of Skeleton, Facial, and Vocal Estimations 
Recognition of the subject identity can be transferred to a classi?cation problem using 
the features of skeleton, face, and voice: 
^ n 
¼ 
C1 Fm 
s 
; Fm 
f 
; Fm 
v 
m m 
ð16Þ 
where C1 is the classi?cation function, and ^ n is the estimated subject index if in the 
white/black list, and null if not. 
The classi?er can exploit correlations between the data streams. For example, facial 
recognition, can enhance vocal recognition [35]. A solution to (16) is to deploy in a ?rst 
layer, a classi?er for each sensor data stream, and then to feed its estimates as features 
into a second layer classi?er, constrained by the classi?cation con?dence from the ?rst 
layer: 
^ n 
¼ 
F ^ ns; ^ nf; ^ nv 
v v 
s:t: Qm 
s 
; Qm 
f 
; Qm 
v 
ð17Þ 
The multiple layer implementations, enable using state of the art methods for each 
stream, and ease the control over the diversity of the multiple sources. In this work, we 
will focus, without loss of generality, on the two-layer classi?cation solution. A sub-optimal 
solution to the problem in (16), is a linear combination of the soft estimations 
of each subject, and then ?nding the index with maximal probability, or above 
uncertainty threshold: 
^ n 
¼ argmaxnðasnþ afnþ avnÞ; 
ð18Þ 
where asn; afn; avn are a measure of the reliability of each subject n in the list (white or 
black). 
The reliability measure can be estimated by the quality measure of the estimations, 
Qm 
s 
, Qm 
f 
, and Qm 
v 
. Additional prior knowledge can be incorporated into this measure. 
For example, when the subject is in non-line of sight conditions, its reliability is 
reduced [36]. Similarly, when the faces are hidden, the subject turn around, or when the 
subject is relatively far away from the microphone, the facial and audio estimations 
qualities are reduced [20]. In the binary case, the solution coincides with the well-known 
Majority of voting [37]. 
892 Y. Lavi et al.
When multiple Kinect sensors are available in the environment, point cloud fusion 
like the one in [38], can be used to give to a more accurate single Kinect reference, on 
which the suggested methods can be applied. 
3 Experimental Setup 
To validate the suggested methods, and to examine the effect of change in environ-mental 
conditions on system performance, we performed four experiment sets on ten 
adult subjects (6 males, 4 females), in challenging environmental conditions that 
include multiple subjects, shadowing with partial occlusions, change of clothing, and 
change in lightning conditions. 
3.1 Experiment Setup 
The motion sensor was Microsoft Kinect (Kinect v2), which consisted of a depth sensor 
based on an infrared (IR) projector and camera, a color camera, and a voice recorder 
[38]. The Kinect SDK was used. A dedicated software and Graphical User Interface 
(GUI) were built in Matlab software (Matlab INC, version 2016.a). For real-time 
implementation, the biometric system can be performed on blocks, that induce a delay 
equal to the number of frames in the block [36], or on frame by frame base. 
Fig. 2. The system setup. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 893
The feasibility of the new technology was demonstrated with four experiment sets 
in a single room (at Tel Aviv University, Israel) of size 5 
f 
10 m, with ten adults 
subjects (six adult males, and four adult females, the ?rst, third and fourth subject were 
the authors, GB, DB, and YL). The subjects were inside the optimal range region, 
within horizontal angular range of 70o , and from 1 to 5 m from the Kinect camera. The 
setup, with two subjects and one chair, is shown in Fig. 2. 
3.2 Experiment Sets 
The experiments sets were designed to show feasibility of the biometric system and to 
investigate the performance under real life conditions. The experiments were: (1) A 
subject standing still in front of the sensor, spreading his/her hands to the sides, 
walking in random directions in the Kinect range, turning around, sitting on a chair 3 
times and walk randomly in the Kinect range while holding a cellular phone; (2) A 
subject standing, titling his head slowly in all directions, and counting from one to ten; 
(3) A subject standing in front of the Kinect and making 5 facial expressions of happy, 
sad, angry, surprised, scared. In addition the subject was asked to wear glasses and 
remove the glasses; and, (4) A subject changing his/her cloth, walking around with two 
other subjects, with change light conditions (from light to dark), and with shadowing 
from people and from an daily life object (chair). 
The ?rst experiment set was design to derive diverse Kinect skeleton and facial 
features for training. The second was to extract vocal features. The third was to test the 
facial features tolerance to various facial expressions. The fourth was designed to 
evaluate the biometric system tolerance to extreme changes in environmental condi-tions. 
Figure 3 shows snap shots from the Kinect system from experiment sets. 
3.3 Software Modules 
The dedicated software and GUI was used to retrieve the skeleton, facial, and vocal 
data streams in real-time on top of the bridge software tool [39]; for monitoring; for 
playback for human expert diagnosis; for tagging the different subjects with their 
names; and for implementing classi?cation algorithm for subject recognition. Figure 4 
(a) describes the GUI for recording, training the classi?ers, playback, and monitoring. 
Figure 4(b) demonstrates the process of tagging the different subjects in the white list. 
894 Y. Lavi et al.
4 Results and Discussion 
4.1 Pre-processing 
The data was captured, and labeled for all experiment sets, using the GUI described in 
Sect. 3.3. The features were extracted as described in Sect. 2. After extracting the 
features, prior-knowledge on human facial and body was applied on the features. For 
this, non-feasible features values of the facial and skeletal features were truncated 
similar to [38]. The minimal feasible proportion ratio between any two body-parts was 
set to be 10%, and the maximal, and minimal body part lengths, were set to 5 and 
100 cm for the body, and 2 and 25 cm for the face. 
A median ?ltering was performed for each feature, with values of eAs 
¼ 
0:4, 
eAf 
¼ 
0:4, and eAv 
¼ 
0:1, in similar manner as [36]. A baseline for the con?dence 
values of the features was derived based on deviation from the mean value. Signi?cant 
features were derived using F-test criterion, and non-signi?cant features were excluded. 
Then a PCA was performed on the features. 
Fig. 3. Different sets description. (a)–(d) shows snap shots from Kinect camera from the training 
sets (?rst, second) and the fourt extreme condition sets. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 895
4.2 Training 
The training was performed on the ?rst experiment set, for the facial and skeletal 
streams, and on the second set for the voice. The representation of the training sets after 
pre-processing and artifact removal for the skeletal, facial, and vocal features with the 
?rst two PCs are shown in Fig. 5. The explained variance using the ?rst 2 PCs for the 
skeletal, facial, and vocal streams was 74.1, 99.8, and 9.1635%, respectively. To reach 
an explained variance of over 95%, 8, 2, and 354 features were needed, respectively. 
This indicates, on more compact representation for the skeletal and facial features 
compared to the vocal features. The higher number of vocal features can be explained 
by the dynamic nature of the voice, and its variability over wide range of spectral 
portions, compared with the static features of the body and face that were used in this 
work. 
Fig. 4. Different sets description: (a) describe the main GUI for recording, playback, training 
and recognition (in real-time); (b) describes the window for labeling the different subject 
(considered as being in white list). 
896 Y. Lavi et al.
4.3 Identi?cation Under Environmental Changes 
The effect of the environmental condition in reference to the training set is shown in the 
PC’s domain for each data stream in Fig. 6. For the voice, the testing was performed on 
the third set, and for the skeletal and facial, was on the fourth set of the extreme 
changes in environmental conditions. Due to the spread of the medians from the testing 
set, the effect of environmental changes is more signi?cant in the body (skeleton) 
identi?cation. The change in environmental conditions increase the spread of the 
experiment conditions’ distributions, but still, even in the skeleton, the different subject 
seems to be located in relatively separable clusters. 
For the identi?cation, Discriminate Function Analysis (DFA) classi?er was used to 
recognize the subject of interest, according to (8), (11), and (14). The Receiver 
Operating Characteristics (ROC) curve, based on the soft DFA classi?er outputs is 
presented in Fig. 7. The curves are calculated by averaging the ROC curves for all the 
subjects for each environmental condition. The ROC curves for skeletal (7a), facial 
(7b), and vocal (7c) data streams are shown in Fig. 7(a), (b), and (c), respectively. The 
skeletal based estimations are the most effected by the environment conditions, com-pared 
to the facial, and vocal, which agrees with the features separation shown in the 
scatter plots in Figs. 5 and 6. The true positive rate is high, with lower variability, for 
the vocal and facial features, and lower with higher variability for the skeletal 
classi?cation. 
For the skeletal features, the training success rate is high of around 98%, and false 
positive of 0.05, with low standard deviation of 0.1163. For the control, which was 
extracted from the times the subject of interest skeleton, the accuracy was reduced to 
around 75%, at false positive of 0.1, with higher error standard deviation of 0.2737, 
which might be due to occlusions with other objects in the scene while moving, and 
due to reaching non-optimal range. The true positive rate dropped to around 65%, for 
false positive rate of 0.1, with standard deviation between subjects of around 0.3. This 
shows that changes in cloth, occlusion with objects like chair, or other subjects in the 
environment, or change of light, all degrade the performance of the Kinect estimations. 
For the facial based estimation, the success rate is over 0.96 for all experimental 
conditions, with small variance of around 0.11, between subjects. For the vocal, 
Fig. 5. Representation of the skeletal (a), facial, (b), and vocal (c) features in the ?rst two PCs. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 897
the true positive rate for the test set was around 0.97, with standard deviation between 
subjects of less than 0.1. 
These experiments, were mostly performed when the subject was facing to the 
Kinect camera, and the distance from the camera, was suf?cient for high vocal 
recognition. Both are assumption, can’t be always assumed in real-life conditions, 
where the subject can be out of the range for vocal identi?cation, or with his back to the 
camera, where the facial information cannot be available. Table 1 summarize the tol-erance 
to environmental conditions. 
4.4 Multi-sensor’s Aggregation 
For aggregation, we will use the soft decision of each classi?er, according to (17). The 
instantaneous weights are the con?dence level of the classi?er, and without prior-knowledge 
assumptions, we assume equal weight for each classi?er, the weights of the 
Fig. 6. Effect of condition on subject recognition across experiments. The subject representation 
in the vocal and facial seems to be less invariant to change in environment conditions, compared 
to the skeleton based system. 
898 Y. Lavi et al.
classi?ers are equal. An example for sensor aggregation of skeletal, and facial classi-?ers, 
is given in Fig. 8. Figure 8(a) and (b) shows the instantaneous skeletal and vocal 
based classi?er results, and their estimation quality estimated by their classi?cation 
con?dence. For both estimations the highest con?dence is for subject 3, but for the 
skeletal estimations, it is not signi?cant, as other subjects, like the second one, also has 
high con?dence score. Figure 8(b) shows the implementation of Eq. (17), by summing 
the average weights, and selecting the maximal value, which results in estimating 
subject 3, with higher con?dence, compare to the other subject. This demonstrates, that 
sensor aggregation, can improve the overall results of the biometric system. 
Fig. 7. ROC curves representing the effect of enviromental conditions on the classi?er for 
skeletal (a) facial, (b), and vocal (c) data streams. 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 899
Table 1. Summary of tolerance to environmental changes 
Data stream Skeleton Facial Vocal 
Tolerance to change in light Low High Maximal 
Tolerance to change in 
clothing 
Medium High High 
Tolerance to absence of facial 
appearance 
High None High 
Tolerance to multiple persons Low – risk of 
distortion/occlusion 
High Low – subject’s voice 
might be mixed 
Tolerance to nearby objects Low. Risk of 
distortion/occlusion 
High 
Tolerance to audio noise None None High 
Accessibility Between 1–5 m, in 
limited angles 
Frontal Anywhere, up to 10 m 
Fig. 8. Effect of condition on recognition. (a) and (b) shows the skeletal and vocal classi?cation 
signi?cancy over time (marked by black dots, between 0 to 1), which can be used as weights in 
combining the estimations. (c) shows aggregation of the mean estimation in the experiment time 
slot, where the third subject was identi?ed. 
900 Y. Lavi et al.
4.5 Real-Time Implementation 
The identi?cation can be performed instantaneously or by using blocks. The buffer size, 
induce delay in the recognition, but promise to be more accurate, as it average the 
statistics over time. In some implementations, the statistics can be calculated on-line, 
thus the delay is only initial, when new subject enter the scene. The windows length 
can be optimized based on the training set. 
5 Conclusions and Future Work 
In this paper, we examined the use of Kinect facial, skeletal, and vocal data streams for 
forming an enhanced multi-sensing biometric system. Methods were derived to extract 
features, ?lter artifact, and identify each subject separately or together. The methods 
were veri?ed and the system performance was evaluated in challenging environmental 
conditions that include change of cloth, change of light, and environment with multiple 
subjects and objects. The results of this work, show ways to utilize the three inde-pendent 
streams of data, and form from each a complementary information in chal-lenging 
environment that can improve the recognition based on each stream separately. 
In future, the suggested technology should be veri?ed with more environmental 
conditions, and with more subjects. The skeletal data can be replaced in future with 
single optical camera, by running session of training, and with no sacri?ce in perfor-mance 
[13]. Utilization of the suggested system and procedure can enable in future 
affordable ef?cient biometric system that has higher tolerance to extreme conditions. 
The biometric system can be aggregated with the Kinematic features estimation of the 
Kinect, and enable continuous assessment of activity of subject of interest. 
Acknowledgment. We would like to thanks the participating in the test sets. Special thanks to 
Dr. Hagai Aronowitz from IBM research, for referring the authors to papers in the ?eld of vocal 
identi?cation and to related python package that assisted in extracting the vocal features, and last 
to Prof. Alex Bronstein, for his help in supervising the students, asking challenging questions in 
their ?nal examination, and in contributing from his wise comments to improve the paper quality. 
References 
1. Prabhakar, S., Pankanti, S., Jain, A.K.: Biometric recognition: security and privacy concerns. 
IEEE Secur. Priv. 1(2), 33–42 (2003) 
2. Mishra, D., Mukhopadhyay, S., Kumari, S., Hurram Khan, M.K., Chaturvedi, A.: Security 
enhancement of a biometric based authentication scheme for telecare medicine information 
systems with nonce. J. Med. Syst. 38(5), 41 (2014) 
3. Gorodnichy, D.O.: Evolution and evaluation of biometric systems. In: IEEE Symposium on 
Computer and Intelligence for Security and Defense Applications, CISDA 2009 (2009) 
4. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. 
Circ. Syst. Video Technol. 14(1), 4–20 (2004) 
5. Wildes, R.P.: Iris recognition: an emerging biometric technology. Proc. IEEE 85(9), 1348– 
1363 (1997) 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 901
6. Jain, A.K., Hong, L., Kulkarni, Y.: A multimodal biometric system using ?ngerprint, face, 
and speech. In: International Conference on Audio- and Video-Based Biometric Person 
Authentication (AVBPA), pp. 182–187 (1999) 
7. Iwashita, Y., Uchino, K., Kurazume, R.: Gait-based person identi?cation robust to changes 
in appearance. Sens. (Switz.) 13(6), 7884–7901 (2013) 
8. Blumrosen, G., Fishman, B., Yovel, Y.: Noncontact wideband sonar for human activity 
detection and classi?cation. IEEE Sens. J. 14(11), 4043–4054 (2014) 
9. Springer, S., Seligmann, G.Y.: Validity of the kinect for gait assessment: a focused review. 
Sens. (Switz.) 16(2), 1–13 (2016) 
10. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012) 
11. Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from RGBD images. 
In: IEEE International Conference on Robotics and Automation, pp. 842–849 (2011) 
12. Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect face database for face recognition. 
IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2013) 
13. Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Sha?ei, H., Seidel, H., Xu, W., Casas, 
D.A.N., Theobalt, C., May, C.V.: VNect: real-time 3D human pose estimation with a single 
RGB camera. In: Siggraph, pp. 1–13 (2017) 
14. Da Luz, L., Masek, M., Lam, C.P.: Activities of daily living classi?cation using depth 
features. In: IEEE International Conference on IEEE Region 10, TENCON 2013, pp. 1–4 
(2013) 
15. Galna, B., Barry, G., Jackson, D., Mhiripiri, D., Olivier, P., Rochester, L.: Accuracy of the 
Microsoft Kinect sensor for measuring movement in people with Parkinson’s disease. Gait 
Post. 39(4), 1062–1068 (2014) 
16. Goswami, G., Vatsa, M., Singh, R.: Face recognition with RGB-D images using kinect. In: 
Bourlai, T. (ed.) Face Recognition Across the Imaging Spectrum, pp. 281–303. Springer, 
Cham (2016) 
17. Ouellet, S., Grondin, F., Leconte, F., Michaud, F.: Multimodal biometric identi?cation 
system for mobile robots combining human metrology to face recognition and speaker 
identi?cation. In: Proceedings of IEEE International Workshop on Robot and Human 
Interactive Communication, vol. 2014, pp. 323–328, October 2014 
18. Sinha, A., Chakravarty, K., Bhowmick, B.: Person identi?cation using skeleton information 
from kinect. In: Sixth International Conference on Advances in Computer-Human 
Interactions, ACHI 2013, pp. 101–108 (2013) 
19. Blumrosen, G., Miron, Y., Plotnik, M., Intrator, N.: Towards a real time kinect signature 
based human activity assessment at home. In: 2015 IEEE 12th International Conference on 
Wearable and Implantable Body Sensor Networks (BSN), pp. 1–6 (2015) 
20. Blumrosen, G., Miron, Y., Intrator, N., Plotnik, M.: A real-time kinect signature-based 
patient home monitoring system. Sensors 16(11), 1965 (2016) 
21. Blumrosen, G., Uziel, M., Rubinsky, B., Porrat, D.: Noncontact tremor characterization 
using low-power wideband radar technology. IEEE Trans. Biomed. Eng. 59(c), 674–686 
(2012) 
22. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., 
Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. 
ACM 56(1), 116–124 (2013) 
23. Microsoft Inc. (2015). http://www.microsoft.com/en-us/kinectforwindows/ 
24. Daniel, H.J.K.: Joint depth and color camera calibration with distortion correction. IEEE 
Trans. Med. Imaging 34(10), 2058–2064 (2012) 
25. Microsoft: No Title. Report. https://msdn.microsoft.com/en-us/library/dn785525.aspx 
26. Gkalelis, N., Tefas, A., Pitas, I.: Human identi?cation from human movements. In: 2009 
16th IEEE International Conference on Image Processing (ICIP), pp. 2585–2588 (2009) 
902 Y. Lavi et al.
27. Sinha, A., Chakravarty, K., Bhowmick, B.: Person identi?cation using skeleton information 
from kinect. In: Proceedings of International Conference on Advances in Computer-Human 
Interactions, no. c, pp. 101–108 (2013) 
28. Calderita, L.V., Bandera, J.P., Bustos, P., Skiadopoulos, A.: Model-based reinforcement of 
kinect depth data for human motion capture applications. Sensors 13(7), 8835–8855 (2013) 
29. Donath, L., Faude, O., Lichtenstein, E., Nüesch, C., Mündermann, A.: Validity and 
reliability of a portable gait analysis system for measuring spatiotemporal gait character-istics: 
comparison to an instrumented treadmill. J. Neuroeng. Rehabil. 13(1), 1–9 (2016) 
30. Huang, H.Y., Chang, S.H.: A skeleton-occluded repair method from kinect. In: 2014 
International Symposium on Computer, Consumer and Control (IS3C), pp. 264–267 (2014) 
31. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. 
Res. 3(Mar), 1157–1182 (2003) 
32. Breiman, L.E.O.: Random Forests, pp. 5–32 (2001) 
33. Aronowitz, H., Aronowitz, V.: Ef?cient score normalization for speaker recognition, 
pp. 4402–4405 (2010) 
34. Garcia-romero, D., Espy-wilson, C.Y.: Analysis of I-vector length normalization in speaker 
recognition systems 
35. Wang, J., Zhang, J., Honda, K., Wei, J., Dang, J.: Audio - visual speech recognition 
integrating 3D lip information obtained from the Kinect. Multimed. Syst. 22(3), 315–323 
(2016) 
36. Blumrosen, G., Miron, Y., Plotnik, M., Intrator, N.: Towards a real-time kinect signature 
based human activity assessment at home. In: Body Sensor Network (2015) 
37. Kittler, J., Society, I.C., Hatef, M., Duin, R.P.W., Matas, J.: On combining classi?ers. IEEE 
Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998) 
38. Córdova-Esparza, D.-M., Terven, J.R., Jiménez-Hernández, H., Herrera-Navarro, A.-M.: A 
multiple camera calibration and point cloud fusion tool for Kinect V2. Sci. Comput. 
Program. 143, 1–8 (2016) 
39. Terven, J.R., Córdova-Esparza, D.M.: Kin2. A Kinect 2 toolbox for MATLAB. Sci. 
Comput. Program. 130, 97–106 (2016) 
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 903
Towards the Blockchain-Enabled O?shore Wind 
Energy Supply Chain 
Samira Keivanpour1(?) , Amar Ramudhin2 , and Daoud Ait Kadi3 
1 
Department of Management, Information and Supply Chain, Thompson Rivers University, 
Kamloops, BC, Canada 
skeivanpour@tru.ca 
2 
Logistics Institute of University of Hull, Hull, UK 
Ramudhin@hull.ac.uk 
3 
Department of Mechanical Engineering, Laval University, Quebec, Canada 
Daoud.Aitkadi@gmc.ulaval.ca 
Abstract. While the technology of o?shore wind production is more or less 
mature, there are still many issues to be solved for mass production and deploy- 
ment of wind farms at reasonable costs. Hence, one of the challenging topics in 
developing a supply-chain strategy of o?shore wind energy is e?ciency on the 
one hand and meeting the requirements of stability, ?exibility, and adaptability 
from market, technology and policy perspectives on the other. Supply-chain 
management of o?shore wind energy requires processing a large amount of data 
and e?ective traceability and visibility. Blockchain as a fresh technology could 
provide the solution for interoperability and collaboration among a number of 
suppliers dispersed all over the world. This paper seeks to discuss the opportu- 
nities of blockchain technology in upstream, midstream and downstream of 
o?shore wind energy. 
Keywords: O?shore wind energy supply chain · Blockchain · Transparency 
Visibility · Industry 4.0 
1 Introduction 
The o?shore wind industry is expanding fast around the world due to several advantages 
of this source of renewable energy. The stronger winds resources in the o?shore areas, 
the lack of social and geographical constraints of onshore wind power, the technology 
evolution and the increasing demand for electricity in coastal regions are some of these 
bene?cial factors [1]. The players in the supply chain of o?shore wind energy include 
developer/owner and operator, turbine, substation, foundation, array cabling and export 
cable manufacturers, consultants and other contractors for ports, geophysical survey, 
navigations, project management, maritime tra?c and navigation risk, insurer, vessel 
supports and suppliers, and cable positioning. The di?erent services for vessel installa- 
tion, accommodation, maintenance, and logistics should also be considered. Considering 
this complexity, handling the needs of stability, ?exibility, adaptability and cost e?- 
ciency is essential in o?shore wind supply chain. Hence, supply-chain management of 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 904–913, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_67
o?shore wind energy requires processing a large amount of data and e?ective tracea- 
bility and visibility. Blockchain technology is a fresh concept that has been taken into 
account in recent years. The application of this technology in logistics and supply chain 
is in the infant stage. In this study, ?rst, a brief review of the application of blockchain 
in the supply chain in di?erent industrial contexts is provided. Then, based on the 
con?guration of the o?shore wind energy supply chain, a conceptual framework is 
proposed to discuss the application perspective of blockchain in the o?shore wind 
industry. The contribution of this study consists in taking a new step toward blockchain-enabled 
supply chain in o?shore wind energy. The rest of the paper is organized as 
follows: Sect. 2 explains the characteristics of the o?shore wind energy supply chain. 
Section 3 provides a review of blockchain technology and its application in logistics and 
supply chain, Sect. 4 explains the conceptual framework and ?nally, Sect. 5 concludes 
with some challenges and the opportunities for future research. 
2 O?shore Wind Energy Supply Chain 
Handling the needs of cost e?ciency, stability, ?exibility, and adaptability are essential 
in o?shore wind supply chain. The o?shore wind industry is growing and there are many 
opportunities for cost e?ciency due to economies of scale, technology improvement 
and policy supports. The supply chain plays a critical role in this cost reduction. ORE 
Catapult [2] published a report on analyzing cost reduction opportunities in o?shore 
wind farms. Three indicators of competition, collaboration, contracting are mentioned 
as the key parameters for assessing cost reduction opportunities in the supply chain. 
Kaiser and Snyder [3] developed a model for estimation of the costs of o?shore wind 
energy. For supply chain, the authors highlighted the role of turbines, foundation, cables, 
and installation services. They mentioned that few players in turbine manufacturing, the 
installation and the amount of required investments for vessel construction are the main 
reasons for high costs of development and maintenance. EC HARRIS [4] recommended 
increased competition, vertical collaboration and economic scale as three essential 
drivers for cost reduction in o?shore wind supply chain. Roeth et al., [5] studied cost 
reduction opportunities of New York o?shore wind energy. The authors discussed global 
competition, innovation, and collaboration among the players as the key drivers of cost 
reduction. The supply chain of o?shore wind farms includes three sections. Upstream 
includes turbine manufacturing and its three tiers of suppliers (sub-components, parts, 
and materials), o?shore section (foundation, substation, and vessel) and its three tiers, 
cable manufacturing (export and inter-array cables) and its three tiers and ?nally research 
and development. The midstream includes wind farm developing and all relevant serv- 
ices such as logistics, construction and installation services as well as operations. The 
downstream includes the power companies and the end users of electricity generated by 
the o?shore wind farm. The supply chain con?guration is shown in Fig. 1. 
Towards the Blockchain-Enabled O?shore Wind Energy Supply Chain 905
Fig. 1. Upstream, midstream and downstream of o?shore wind energy supply chain. 
Typical wind condition such as the speed of the wind, direction, and intensity or 
ocean conditions such as average wave height, period, and the tide could a?ect facility 
design, operations planning and performance monitoring of o?shore wind farms [6]. 
The Marine subsurface condition such as ocean depth temperature, marine growth, 
sea?oor scour could a?ect operations facility design. In addition, extreme conditions 
such as extreme wind gusts, hurricane, lightning, and earthquakes also a?ect energy 
projection, design, operations and performance of o?shore wind farms. Water depth, 
wind regime, distance to shore and geographical location of o?shore wind farms are 
critical features in supply-chain performance. Another key issue in wind energy oper- 
ation is addressing the balance between supply and demand. Based on the variability of 
the wind power, there is a need for energy storage. The large-scale storage technologies 
could provide a secure energy in the electrical load and help voltage stabilization. 
Adaptability of suppliers in response to rapid innovation and technology trends in 
o?shore wind industry plays an essential role in supply-chain management. Turbine 
components, foundation structure, wind farm layout, and the electrical grid connection 
are essential elements in determining the con?guration of the o?shore wind farms that 
are strongly dependent on technology development [7]. 
The government also plays a critical role in the renewable energy sector. The plan 
and policy provided by local government could shape the business structure. The mech- 
anisms for tax credit and subsidies have impacts on investment and development of the 
wind power zones. Moreover, the other role is supporting the local suppliers and business 
sectors to lead the growth and competition in o?shore wind energy market. The collab- 
oration between government and di?erent industrial parties could facilitate and expedite 
the development process and remove the barriers and challenges. 
Governance of the o?shore wind supply chain is also challenging. Degree of stand- 
ardization in decision-making process, performance measurement systems through 
o?shore wind supply chain, the degree of control over tiers of suppliers in upstream, 
midstream and downstream [8], number of suppliers in di?erent tiers of supply chain, 
number of operating locations, the interdependency of the suppliers and suppliers lead 
time and bullwhip e?ects are the sources of supply-chain complexity. 
906 S. Keivanpour et al.
Hence, supply chain management of o?shore wind energy requires processing a large 
amount of data and e?ective traceability and visibility. 
3 Industry 4.0 and Blockchain Technology in Logistics and Supply 
Chain 
Internet of things (IoT), big data analysis and Industry 4.0 are fresh concepts in the 
manufacturing context. IoT is a network of physical objects that provides interaction 
and collaboration of the di?erent objects [9] and facilitates data exchange and decision-making 
and e?ciency. Hofmann and Rüsch [10] discussed the application of Industry 
4.0 in logistics management. The authors highlighted the value of cyber-physical 
systems in two logistics practices (Just in Time (JIT) and Kanban systems). The authors 
stressed that Industry 4.0 can improve the cross-company oriented logistics model by 
facilitating transparency of information sharing, traceability of information and handling 
the real-time information in di?erent tiers of the supply chain. 
The blockchain is a decentralized network of the actors that can handle the cloud 
functions in the network in a peer-to-peer framework. The value of this new technology 
cannot be limited in economic and ?nance by using transparency in transactions among 
several users, but can also be extended to several social, humanitarian and scienti?c 
applications [11]. Blockchain technology includes the chain of several blocks that can 
be shared among users. Each block includes three essential elements: data inside the 
block (including the sender and receiver information and the data related to the exchange 
between these two users) a hash (that is de?ned as the identity of the block based on the 
inside data) and previous block hash (Fig. 2). There are no authority, middleman or 
centralized centres for storing and tracking the records in blockchain network. Instead 
of a centre agent, each agent in the network has access to a ledger that includes the 
records of the data. Hence, the distributed ledger is an essential mechanism in the block- 
chain. The smart contract is a set of instructions for the users of the blockchain network. 
This digital contract is a pre-written agreement between the users. There are few studies 
that focused on the application of blockchain in the supply chain or industry chain. A 
brief review is provided in this section. Tian [12] studied the utilization of blockchain 
technology in traceability of agriculture food supply chain. The author discussed using 
Radio Frequency Identi?cation (RFID) and blockchain technology in building a decen- 
tralized supply chain for improving monitoring and controlling the safety and quality of 
food in China. Sikorski et al. [13] introduced the application of machine-to-machine 
blockchain in the chemical industry. The authors developed a simulation for trading 
between two producers and one consumer over a blockchain. Korpela et al. [14] 
discussed the role of blockchain in the integration of supply chain in a digital supply 
chain framework. Clauson et al. [15] investigated the application of blockchain in the 
public healthcare supply chain. The authors highlighted that considering the role of 
safety and security in the health supply chain, blockchain technology would be valuable 
for functionality, integrity and data provenance in the healthcare context. Madhwal and 
Pan?lov [16] proposed the application of blockchain in aircraft assembly operation. The 
authors discussed that traceability of the blockchain could increase the security of spare 
Towards the Blockchain-Enabled O?shore Wind Energy Supply Chain 907
parts market with the authenticity of the parts. Debabrata and Albert [17] discussed the 
advantage of blockchain in performance management of supply chain. The authors 
emphasized that blockchain improves the visibility of supply chain and the security of 
information technology-based systems for monitoring and controlling of the supply 
chain’s players. The synthesis of the literature reveals some critical points. First, appli- 
cation of blockchain technology in the supply chain is in the infant stage of development. 
The scholars discussed some application perspectives. However, implementation and 
deep analysis require empirical research and evidence from real industrial cases. Second, 
the applications in the industrial contexts with high need of safety, security, and trace- 
ability are more stressed. Third, cross-company orientated logistics models and busi- 
ness-2-business (B2B) integration in the supply chain are better candidates for using 
this technology in the future. 
Fig. 2. Blockchain elements (blocks, data, hash and previous block hash). 
4 The Conceptual Framework of the O?shore Supply Chain 
with Blockchain Technology 
In this section, a conceptual framework is recommended that discusses the application 
of blockchain technology and Industry 4.0 in the o?shore wind energy supply chain. 
This framework is designed based on the con?guration of o?shore supply chain shown 
in Fig. 1. In this framework, blockchain enabled upstream, midstream and downstream 
are shown (Fig. 3). 
In upstream and midstream of o?shore wind energy supply chain, blockchain-based 
ERP and blockchain-based suppliers’ records and database management could be 
valuable. Parikh [18] discussed the advantage of using blockchain in Enterprise resource 
planning (ERP) systems. The author emphasized the role of blockchain in tracking with 
shared end-to-end provenance and enhancing the security of exchange of information 
among the players. Andrews et al. [19] also discussed the bene?ts of blockchain interface 
908 S. Keivanpour et al.
on ERP systems at the strategic level. The players in supply chain of o?shore wind 
energy could be classi?ed into eight categories: developer/owner and operator, turbine, 
substation, foundation, array cabling, export cable, consultants and other contractors for 
ports, geophysical survey, navigations, project management, maritime tra?c and navi- 
gation risk, insurer, vessel supports, and cable positioning. The other category of 
suppliers is vessel suppliers. The di?erent services for vessels including installation 
vessels, accommodation, and maintenance vessels should be considered. Based on the 
variety of supply-chain processes, the number of players is large. For example, the 
number of suppliers for two o?shore wind farms in the UK, Robin Rigg wind farm and 
Walney Phase 1, is 138 and 237 respectively (O?shore 4C website [21]). Collaboration 
and transparency among the suppliers are crucial in this industrial context. Integrated 
material and ?ow across the supply chain with blockchain-enabled platform can provide 
full transparency via distributed ledger. For example, a distributed ledger system for 
turbine and substation suppliers is shown in Fig. 4. 
Fig. 4. Distributed ledger system for turbine and substation suppliers. 
Fig. 3. The contribution of blockchain technology in di?erent tiers of the supply chain. 
Towards the Blockchain-Enabled O?shore Wind Energy Supply Chain 909
Turbine manufacturer, transport logistics supplier, turbine installation provider and 
turbine maintenance agent should collaborate closely for e?ective production planning 
and ordering process. Distributed ledger could facilitate transparency and the integration 
of supply chain. If there is any change in the MRP (Material Requirement Planning) of 
one supplier, the other suppliers and vendors could get an immediate noti?cation for 
adapting to this change. This level of transparency for o?shore operation is critical as 
the o?shore weather condition can play an important role in project development. The 
records and updated information regarding suppliers could be exchanged via a block- 
chain-based database management system. This integrated framework can provide the 
bene?ts for transport performance measurement of suppliers and enable the supply chain 
to react to unforeseen changes immediately. This platform can also be used for auditing 
the suppliers during di?erent phases of the project development. Software-as-a-Service 
(SaaS) cloud-based (private cloud) can be used for suppliers’ performance management. 
Another perspective of the blockchain in o?shore wind energy is used in logistics and 
transportation. Marine and weather condition monitoring (real-time data monitoring) 
could be used for intelligent routing systems and track autonomous trucks and vessels. 
Real-time optimization, controlling and communication are three essential characteris- 
tics of cyber physics systems. Distributed ledger system of blockchain could facilitate 
the communication and detection of any problem in the logistics system. The logistics 
and maintenance of o?shore wind farm are also critical for maintaining the power and 
the availability of turbines. Blockchain-enabled inventory management systems can 
improve planning, ordering, and delivery of spare part items from onshore location to 
o?shore sites. Distributed ledger enables information-sharing exchange among vendors, 
Fig. 5. Distributed ledger enables information-sharing exchange among vendors, inventory sites, 
autonomous vessels and o?shore sites. 
910 S. Keivanpour et al.
inventory sites, autonomous vessels and o?shore sites (Fig. 5). Another opportunity of 
using blockchain in o?shore wind energy is blockchain-based smart grid for increasing 
the stability and adaptability of production considering the volatility of energy market. 
Mengelkamp et al. [20] developed a model with a decentralized market platform for 
energy trading in the local market. The authors concluded that using blockchain in 
renewable energy market reduces the electricity costs. 
5 Discussion and Conclusion 
Supply chain plays an important role in cost reduction of o?shore wind farms. The 
complexity of supply chain also contributes to the cost and competitiveness of the supply 
chain. The uncertainties in the market, technology and policy sides push decision makers 
to handle the adaptability, stability and cost e?ciency of supply chain at the same time. 
Industry 4.0 and cyber-physical systems revolution in manufacturing and process aid 
integration of communication, computerization, and controlling of the systems and 
processes. Blockchain technology as a fresh concept in Industry 4.0 paradigm enables 
supply chain in handling visibility, traceability, and security of ?ows of information, 
material, and money. In this paper, we discussed the application of this technology in 
the o?shore wind energy supply chain. Blockchain can provide the solution for 
improving dynamics and strategic management of complexity in the supply chain. 
Table 1 summarizes the essential challenges of supply chain development in the o?shore 
wind energy industry and the potential solutions of blockchain technology for these 
challenges. This study provides the ?rst attempt to explain the application of decentral- 
ized ledger concepts and smart contract in o?shore wind energy context. However, there 
are a lot of opportunities and avenues for future research. Utilizing UML (uni?ed 
modelling language) for modelling the information exchange in decentralized ledger 
systems could clarify the application perspective. The challenges of applying blockchain 
including its impact on Switching costs, Bargaining power of suppliers, con?dentiality 
of data and security should be addressed in a separate study. A SWOT analysis 
(strengths, weaknesses, opportunities, and threats) is recommended to highlight the 
challenges and opportunities of this new technology in the context of o?shore wind 
energy. Considering the rapid growth of renewable energy in China, the related chal- 
lenges in this market in comparison to the European renewable energy market for appli- 
cation of blockchain should be addressed. For example, the level of trust and transpar- 
ency required in blockchain-enabled supply chain should be discussed in future research. 
Towards the Blockchain-Enabled O?shore Wind Energy Supply Chain 911
Table 1. The challenges in o?shore wind energy and opportunities provided by blockchain 
technology 
No The challenges in the supply chain of o?shore 
wind energy 
The solution provided by blockchain 
1 Regulation e?ect (policy for development, 
tax credits, and subsidies, supporting the 
suppliers and business sectors) 
Increase transparency of real-time energy 
trades and create prospects to assess credit 
risk precisely (see also Stanley-Smith [22]) 
2 Technology e?ect (foundations, site 
selection, wind measurement, wind turbines, 
electrical transmission, and operation) 
Distributed ledgers of innovation and 
research and development increase 
technology evolution and di?usion among 
the manufacturers of key components 
(turbine, foundation, and electrical 
transmission) 
3 Market e?ect (Capital and operating cost 
changes, discount rate, the rate of investment 
return, debt, power purchase agreement) 
Distributed market platform increases the 
visibility in the market and reduces the risks 
of investment 
4 Formalization (degree of standardization in 
decision-making process, performance 
measurement systems through o?shore wind 
supply chain) 
The decentralized platform makes the 
opportunity of autonomous agents and the 
self-controlling mechanism by smart 
contracts 
5 Centralization (the degree of control over top-tiers 
of suppliers in upstream, midstream and 
downstream of the supply chain) 
6 Horizontal integration (number of suppliers 
in each tier of o?shore wind supply chain) and 
Vertical integration (number of tiers in 
upstream, midstream and downstream) 
Interoperability can improve collaboration 
among suppliers via connected platform 
(including IoT and cloud computing) 
7 Spatial (number of operating locations) 
8 Relationship coherence (level of connection 
between ?rms in the supply chain) 
9 Suppliers lead time Transparency and visibility in supply chain 
decreases the bullwhip impact 
10 Demand variability The blockchain-based smart grid can 
improve energy trade and decrease the 
impacts of uncertainty in demand volatility 
11 Supply variability IaaS and blockchain-based database 
management systems can keep tracking of 
suppliers’ records and improve the 
adaptability of supply chain 
12 The complexity of the process and planning Real-time tracking of material and 
integration of supply chain via 
decentralized ledger can improve project 
planning 
Acknowledgment. This research project was funded in part by the GreenPort Hull. 
912 S. Keivanpour et al.
References 
1. Keivanpour, S., Ramudhin, A., Ait Kadi, D.: The sustainable worldwide o?shore wind energy 
potential: a systematic review. J. Renew. Sustain. Energy 9(6), 065902 (2017) 
2. ORE Catapult (2015). https://ore.catapult.org.uk/reports-and-resources/reports-publications/ 
ore-catapult-reports/ 
3. Kaiser, M.J., Snyder, B.: O?shore wind energy cost modeling: installation and 
decommissioning, vol. 85. Springer, Heidelberg (2012) 
4. OWCRP (2012). https://www.thecrownestate.co.uk/media/5614/ei-echarris-owcrp-supply-chain-
workstream.pdf 
5. Roeth, J., McClellan, S., Ozkan, D., Kempton, W., Levitt, A., Thomson, H.: New York 
O?shore Wind Cost Reduction Study. New York State Energy Research and Development 
Authority: Albany, NY, USA (2015) 
6. Elliot, D., Frame, C., Gill, C., Hanson, H., Moriarty, P., Powell, M., Wynne, J.: O?shore 
resource assessment and design conditions: a data requirements and gaps analysis for o?shore 
renewable energy systems. US Department of Energy, Washington, DC, USA, Technical 
report DOE/EE-0696 (2012) 
7. Higgins, P., Foley, A.: The evolution of o?shore wind power in the United Kingdom. Renew. 
Sustain. Energy Rev. 37, 599–612 (2014) 
8. Choi, T.Y., Hong, Y.: Unveiling the structure of supply networks: case studies in Honda, 
Acura, and DaimlerChrysler. J. Oper. Manag. 20(5), 469–493 (2002) 
9. Jeschke, S., Brecher, C., Meisen, T., Özdemir, D., Eschert, T.: Industrial internet of things and 
cyber manufacturing systems. In: Industrial Internet of Things, pp. 3–19. Springer, Cham (2017) 
10. Hofmann, E., Rüsch, M.: Industry 4.0 and the current status as well as future prospects on 
logistics. Comput. Ind. 89, 23–34 (2017) 
11. Swan, M.: Blockchain: blueprint for a new economy. O’Reilly Media, Inc. (2015) 
12. Tian, F.: An agri-food supply chain traceability system for China based on RFID & blockchain 
technology. In: 2016 13th International Conference on Service Systems and Service 
Management (ICSSSM), pp. 1–6. IEEE (2016) 
13. Sikorski, J.J., Haughton, J., Kraft, M.: Blockchain technology in the chemical industry: 
machine-to-machine electricity market. Appl. Energy 195, 234–246 (2017) 
14. Korpela, K., Hallikas, J., Dahlberg, T.: Digital supply chain transformation toward blockchain 
integration. In: Proceedings of the 50th Hawaii International Conference on System Sciences 
(2017) 
15. Clauson, K.A., Breeden, E.A., Davidson, C., Mackey, T.K.: Leveraging blockchain 
technology to enhance supply chain management in healthcare. Blockchain in Healthcare 
Today (2018) 
16. Madhwal, Y., Pan?lov, P.B.: Blockchain and supply chain management: aircrafts parts 
business case. In: Annals of DAAAM & Proceedings, vol. 28 (2017) 
17. Debabrata, G., Albert, T.: A framework for implementing blockchain technologies to improve 
supply chain performance 
18. Parikh, T.: The ERP of the Future: Blockchain of Things (2018) 
19. Andrews, C., Broby, D., Paul, G., Whit?eld, I.: Utilising ?nancial blockchain technologies 
in advanced manufacturing (2017) 
20. Mengelkamp, E., Notheisen, B., Beer, C., Dauer, D., Weinhardt, C.: A blockchain-based 
smart grid: towards sustainable local energy markets. Comput. Sci. Res. Dev. 33(1–2), 207– 
214 (2018) 
21. O?shore 4C website. http://www.4co?shore.com/windfarms 
22. Stanley-Smith, J.: Blockchain and tax: what businesses need to know? Int. Tax Rev. (2016) 
Towards the Blockchain-Enabled O?shore Wind Energy Supply Chain 913
Optimal Dimensionality Reduced 
Quantum Walk and Noise 
Characterization 
Chen-Fu Chiang(B) 
State University of New York Polytechnic Institute, Utica, NY 13502, USA 
chiangc@sunyit.edu 
Abstract. In a recent work by Novo et al. (Sci. Rep. 5, 13304, 2015), 
the invariant subspace method was applied to the study of continuous-time 
quantum walk (CTQW). In this work, we adopt the aforementioned 
method to investigate the optimality of a perturbed quantum walk search 
of a marked element in a noisy environment on various graphs. We for-mulate 
the necessary condition of the noise distribution in the system 
such that the invariant subspace method remains e?ective and e?cient. 
Based on the noise, we further formulate how to set the appropriate 
coupling factor to preserve the optimality of the quantum walker. Thus, 
a quantum walker based on an N by N Hamiltonian can be e?ciently 
implemented using the near-term quantum technology. 
Keywords: Quantum walk 
·
Dimensionality reduction 
Optimization 
·
Graph 
1 Introduction 
In quantum computing, quantum walks are the quantum analogue of classical 
random walks. Quantum walks are motivated by the use of classical random 
walks and they can be further used for the design of randomized algorithms and 
pseudo random number generators [12], in addition to solving search problems. 
For some oracular problems, quantum walks provide an exponential speedup over 
any classical algorithm [4,6]. Quantum walks also give polynomial speedups over 
classical algorithms for many practical problems, such as the element distinctness 
problem [2], the triangle ?nding problem [9], and evaluating NAND trees [7]. 
The well-known Grover search algorithm can also be viewed as a quantum walk 
algorithm. 
Quantum walks can be formulated in both discrete time [1] and continuous 
time [8] versions. The connection between discrete time quantum walk and con-tinuous 
time quantum walk has been well studied. It was shown that continuous-time 
quantum walk can be obtained as an appropriate limit of discrete-time 
quantum walks [3]. In this work, we focus on the study of continuous-time quan-tum 
walk (CTQW), not only because it o?ers a simpler physical picture but 
u 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 914–929, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_68
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 915 
also it is less challenging to perform CTQW experiments in comparison to their 
discrete-time counterparts. Based on these motivations, we set out to inves-tigate 
how to optimize CTQW searches on a uniform complete multi-partite 
graph. Although uniform complete multi-partite graphs constitute just a subset 
of all possible graphs, they include some of the most important examples, such 
as complete graphs, complete bipartite graphs and star graphs, in applications 
of quantum walks to computations. 
In this work, we adopt the invariant subspace method from [10], which allows 
us to perform a dimensionality reduction to simplify the analyses of CTQW on 
a uniform complete multi-partite graph. In short, the key is to transform the 
original graph to a much simpler structure yet retain pertinent properties that 
we would like to investigate, such as the optimality of a quantum walk search. 
In this way, the analysis becomes more transparent and the dynamics of the 
walker can be more intuitively understood on an abstract level. Throughout the 
text, we also refer to a multi-partite graph as a P -partite with a slight twist 
on the standard notation. The di?erence is that the whole graph has actually 
P + 1 partitions where the extra one partition is the partition that contains the 
solution (marked vertex). 
The contribution from this work is as follows. By applying the systematic 
dimensionality reduction technique via Lanczos algorithm, we extend the appli-cable 
graphs from complete graphs, complete bipartite graphs and star graphs 
[10] to uniform complete multi-partite graphs. We extend a reduction scheme 
to transform an arbitrary N by N adjacency matrix Ha of a uniform complete 
multi-partite graph into a 3 by 3 reduced Hamiltonian that has fast transport 
between its two lowest eigenenergy states. We further parameterize the coupling 
factor based on the con?guration of a given uniform complete multi-partite graph 
to keep the CTQW search on uniform complete multi-partite graphs optimal. 
Finally we characterize the error patterns in which systematic dimensionality 
reduction still takes place such that coupling factor based on our formula will 
keep the quantum walker search remain optimal. 
The remainder of the article is organized as the following. In Sect. 2, we 
brief on the work that applies the Lanczos algorithm to perform dimensionality 
reduction to obtain the right form of the reduced adjacency matrix. In Sect. 2.3 
we further develop theorems to show (a) how to choose the correct coupling 
factor based on the given parameters (con?guration) on a reduced adjacency 
matrix and (b) the optimality is preserved once transformed back to the original 
adjacency matrix. By adding additional constraints to uniform complete multi-partite 
graphs, we recover many useful examples such as complete graphs, star 
graphs and complete bipartite graphs. The reduced Hamiltonian of a uniform 
complete multi-partite graph is slightly di?erent for each of these three cases 
because there are transitions among partitions that behave di?erently for each 
case. In Sect. 3, under three types of errors, systematic disorder, static diagonal 
disorder, and reducible non-diagonal disorder, we characterize the errors such 
that systematic dimensionality reduction to a 3 by 3 Hamiltonian is still feasible. 
With successful reduction, application of our coupling factors in the experiment
916 C.-F. Chiang 
will keep the quantum walker search optimal. Finally in Sect. 4, we draw our 
conclusion. 
2 Background 
In this section, we explain and brief the mechanism and important theorems for-mulated 
in optimizing quantum walk search in uniform complete multiple-partite 
graphs in a perfect setting, i.e. error free. In this way, the analysis becomes more 
transparent and the dynamics of the walker can be more intuitively understood 
on an abstract level. Throughout the text, we also refer to a multi-partite graph 
as a P -partite with a slight twist on the standard notation. The di?erence is 
that the whole graph has actually P + 1 partitions where the extra one partition 
is the partition that contains the solution (marked vertex). 
A uniform complete P-partite graph (UCPG) can be denoted as 
G(V0,V1, ..., VP). Let V be the union of all partitions. A UCPG is a graph with 
P + 1 partitions of vertices with the following properties: (1) each vertex vi in 
vertex partition Vj connects to all other vertices in vertex partition Vk as long 
as j= k (2) except vertex partition V0, each of the vertex partitions has the 
same size. Let the size of the vertex partition Vj be mj, i.e. mj = 
|Vj|. Then 
we know that for a UCPG G with N vertices, it automatically satis?es that 
P 
× 
m1 + m0 = N since m1 = m2 = 
··· = mP. An example of UCPG is given 
at Fig. 1. 
Fig. 1. A UCPG graph G(V0,V1,V2) where m0 = 3 and m1 = 2. The white element is 
the marked element 
|?e  
that resides in partition V0. 
2.1 Dimensionality Reduction 
Without loss of generality, let us assume the marked vertex 
|?e  
is in V0 and 
m0 
0 
1. The original adjacency matrix is 
Ha = 
a 
i?V,i?/V0 
|i?| 
+ 
0 
j?V0,j=? 
 
i?V,i?/V0 
|ij| 
+ 
0 
P 
k=1 
= 
j?Vk 
= 
i?V,i?/Vk 
|ij|. (1)
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 917 
Algorithm 1. Dimensionality Reduction and Coupling Factor Determination 
Require: A UCPG G of N nodes with one marked element 
|?e  
Ensure: 
|?e  
can be found e?ciently by CTQW 
Start of process 
• 
Dimensionality Reduction: Construct the reduced 3 by 3 Hamiltonian Hra by use 
of Lanczos algorithm on the N by N adjacency matrix Ha based on a UCPG G 
• 
Hamiltonian Construction: Construct CTQW Hamiltonian Hseek = 
-?Hra-|??| 
• 
Basis Change: Express Hseek = H(
0) 
+ H(
1) 
in the eigenbasis 
(|?, |e1, |e2) of 
H(
0) 
by applying perturbation theory 
• 
CTQW Initialization: Determine coupling factor ?y to induce fast transport between 
two lowest eigenenergy states 
|?o  
and 
|e1 
in Hseek 
• 
Existence of Constant Overlap: Demonstrate the initial starting state 
|s 
and 
|e1 
have a non-exponentially small overlap such that 
|s 
can reach 
|?n  
e?ciently via 
|e1. 
Therefore the optimality (quadratic speed-up) is thus preserved. 
End of process 
With renormalization, we can express the N vertices in the P + 1 partitions 
in the subspace spanned by 
|?, |SV0-?, |SV1, ··· , |SVP 
where 
|SV0-? 
= 
1 
v· 
m0 
-0 
1 
0 
i?V0,i=? 
|i, (2) 
|SVi 
= 
1 
v) 
mi 
i 
j?Vi,i=0 
|j. (3) 
De?ne the following state that 
|SV ¯
0 
= 
1 
v 
N 
- 
m0 
0 
i?V,i?/V0 
|i. (4) 
By use of Lanczos algorithm and the fact that partitions not containing 
|?t  
have 
the same size, the reduced adjacency Hamiltonian Hra in the 
(|?, |SV0-?, |SV ¯
0) 
basis is 
Hra = 
?r 
?r 
0 0 
vr 
N 
-r 
m0 
0 0 
0 
(N 
-0 
m0)(m0 
v0 
N 
-0 
m0 
-0 
1) 
) 
(N 
-) 
m0)(m0 
-0 
1) N 
-) 
m0 
-0 
m1 
?1 
?1 
(5) 
The entry Hra(3, 3) can be easily veri?ed because of the uniform size of non-solution 
partitions and (P 
-1)×m1 = N -m0-m1. To be complete, we provide 
the reduction process based on Lanczos algorithm in Appendix A. 
2.2 Hamiltonian Construction and Basis Change 
For simplicity, let us de?ne as = 
m0 
N 
and a1 = 
m1 
N 
. Since Hra expressed in the 
(|?, |SV0-?, |SV ¯
0) 
basis captures the same dynamics as Ha, the Hamiltonian 
of a CTQW can be de?ned as [5] 
Hseek = -?Hra 
-] |??| 
(6)
918 C.-F. Chiang 
where ?. is the coupling parameter between connected vertices. By (5, 6), we 
know Hseek = H(
0) 
+ H(
1) 
in the (?, SV0-?,SV ¯
0) 
basis is1,2 
H(
0) 
= 
?) 
?) 
?) 
-1 0 0 
0 0 -?N 
? 
a(1 
-( 
a) 
0 -?N 
? 
a(1 
-( 
a) -?N((1 
-) 
a) 
-) 
Pa2 
1 
1-a 
) 
?- 
?- 
?- 
(7) 
H(
1) 
= 
?) 
?) 
0 0 -?a  
? 
(1 
-? 
a)N 
0 0 0 
-?a  
? 
(1 
-? 
a)N 0 0 
?) 
?) . (8) 
Prior to proceeding further, it is worth noticing that the format of this 
reduced Hamiltonian di?ers from the format derived in [10] for a complete bipar-tite 
graph. The di?erence is the existence of a self-loop entry for the basis vector 
SV ¯
0. 
It later propagates in Hseek and H(
0) 
. Because of this entry, in order to do 
systematic dimensionality reduction, it imposes a stronger constraint of equal 
size for partitions that do not contain the solution. We address this issue in order 
to generalize the result shown in [10] for UCPG. 
In the remainder of the section, we introduce Theorem 1, Lemma 1 and 
Theorem 2. The relationships among them provide the foundation for showing 
the optimality preserving of the underlying CTQW. The optimality preserving 
is explained in Subsect. 2.3. Theorem 1 provides us the technique to construct 
the reduced Hamiltonian Hseek in the eigenbasis 
(|?, |e1, |e2) of H(
0) 
. Lemma 
1 discovers important properties of Hamiltonian Hseek written in the eigenbasis 
(|?, |e1, |e2) to be used in Theorem 2. Theorem 2 shows the necessary condition 
for fast transport to occur in Hseek by tuning the coupling factor ?. 
Now we prove Theorem 1 to show how to express a reduced Hamiltonian 
Hseek in the basis of its major matrix via perturbation theory. For simplicity, 
let us simply call Hseek as H in the theorem. 
Theorem 1. Given a reduced Hamiltonian H = H(
0) 
+ H(
1) 
in the 
(|?, |b1, |b2) basis where 
H(
0) 
= 
?) 
?) 
-1 0 0 
0 0 v1 
0 v1 v3 
?3 
?3 , H(
1) 
= 
?) 
?) 
0 0 v2 
0 0 0 
v2 0 0 
?2 
?2 
(9) 
v 
v) 
Na 
=a 
1. Let the eigenvectors basis of H(
0) 
1 and v2 are negative numbers and v3 is a non-positive number where v1/v2 = 
be 
(|?, |e1, |e2). We choose 
?e = 
v3 
v1 
=1 
0 and 
ß± 
= 
?± 
v± 
?2 +4 
2 
, then we know eigenvector 
|e1 
= 
(|b1+ß+|b2) 
v+ 
1+ß2 
+ 
1 
Clear that a1 = (1 -1 
a)/P . 
2 
Entry (3,3) at H(
0) 
is thus -?N((1 
-s 
a) 
-) 
(1 
-) 
a)/P ) = -?(N 
-? 
m0 
-0 
m1).
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 919 
and eigenvector 
|e2 
= 
(|b1+ß-|b2) 
v- 
1+ß2 
-2 
where the corresponding eigenvalues are 
?± 
= v1ß±. H can thus be written in the 
(|?, |e1, |e2) eigenbasis as 
H = 
?s 
?s 
?s 
?s 
?s 
-1 v2 
ß2 
v2 
ß2 
+ 
+ 
+1 
v2 
ß2 
v2 
ß2 
-2 
-2 
+1 
v2 
v2 
ß2 
+ 
+1 
ß+-ß-    
?+ 0 
-v2 
vv 
ß2 
-2 
+1 
ß+-ß-    
0 
?--  
?- 
?- 
?- 
?- 
?- 
. (10) 
Proof. It is clear to see that 
|e1 
and 
|e2 
are both vectors of linear combination 
of 
|b1 
and 
|b2. Without loss of generality, let 
|e  
= 
|b1+ß|b2 
be an eigenvector 
of H(
0) 
with eigenvalue ?. After some calculation we obtain ?e = ßv1 where 
ßv = 
?v 
± 
vv 
?2 + 4 
2 
, ?2 = 
v3 
v1 
. (11) 
For simplicity, let ß+ be 
?+ 
v+ 
?2 +4 
2 
and 
ß-d  
be 
?-d  
v- 
?2 +4 
2 
. By renormalizing the 
eigenvectors 
|e 1 
= 
|b1 
+ 
ß+|b2, 
|e 2 
= 
|b1 
+ 
ß-|b2, we have 
|e1 
= 
|e 1 

 
 
ß2 + 
+ 1 
, 
|e2 
= 
|e 2 

 
 
ß2 
-2 
+ 1 
(12) 
such that 
H(
0) 
|e1 
= 
?+|e1, H(
0) 
|e2 
= 
?-|e2 
(13) 
where 
?± 
= 
ß±v1. (14) 
In the 
(|?, |b1, |b2) eigenbasis, from (9) we know H(
1) 
|b1 
= 0, H(
1) 
|?  
= 
v2|b2 
and H(
1) 
|b2 
= 
v2|?. To express H(
1) 
in the 
(|?, |e1, |e2) eigenbasis, by 
simple basis change, we obtain 
H(
1) 
|e1 
= v2(ß+/( 

2 
ß2 + 
+ 
1))|?, H(
1) 
|e2 
= v2(ß-/( 

2 
ß2 
-2 
+ 
1))|?(  
(15) 
H(
1) 
|?)  
= 
v2|b2 
= 
v2 

2 
ß2 + 
+ 1 
ß+ 
-+ ß-  
|e1 
+ 
-v2 

v 
ß2 
-2 
+ 1 
ß+ 
-+ ß-2  
|e2. (16) 
Hence, the Hamiltonian H can be expressed as shown in (10). 
Lemma 1. Given a derived reduced Hamiltonian H written in the 
(|?, |e1, |e2) basis as shown in Theorem 1, we then know that (a) Hamilto-nian 
H is symmetric and (b) ß+ > 0 > 
ß-d  
and ?+ < 0,?-d  
> 0.
920 C.-F. Chiang 
Proof. With the value of 
ß± 
as shown in Theorem 1, we know that 
ß+ß-e  
= -1 (17) 
and it immediately leads to the observation that 
ß+(ß+ 
-+ 
ß-) = ß2 + 
+ 1, ß-(ß+ 
-+ 
ß-) = -(1 + ß2 -). 
(18) 
With this observation, we can immediately conclude that 
v2 
ß+ 

+ 
ß2 + 
+ 1 
= 
v2 

2 
ß2 + 
+ 1 
ß+ 
-+ ß-n  
, v2 
ß2 

2 
-2 
ß2 
-2 
+ 1 
= 
-v2 

v 
ß2 
-2 
+ 1 
ß+ 
-+ ß-2  
. (19) 
Therefore, the property (a) that H is symmetric is proved. For property (b), 
since 
v) 
?2 + 4 > ?2 > 0, we immediately have ß+ > 0 > ß-. And with the fact 
that v1 < 0 and ?± 
= 
ß±v1, we can also immediately conclude that ?+ < 0 and 
?-d  
> 0. 
For simplicity, let 
d1 = v2 
ß+ 

+ 
ß2 + 
+ 1 
, d2 = v2 
ß2 

2 
-2 
ß2 
-2 
+ 1 
. (20) 
By use of Lemma 1, H can be written in the 
(|?, |e1, |e2) basis as 
H = 
?s 
?s 
-1 d1 d2 
d1 ?+ 0 
d2 0 
?-,  
?- 
?- 
(21) 
where 
|?)  
and 
|e1 
can form the basis for the two states of the lowest eigenvalue. 
Theorem 2. Given a Hamiltonian H in the form shown in Lemma 1, it is 
desirable to have ?+ = -1 such that 
|?e  
and 
|e1 
form the basis for the two 
states of the lowest eigenvalue. Since v1 = -?N( 
1 
a(1 
-( 
a)) then the degeneracy 
between site energies of 
|?e  
and 
|e1 
facilitates transport between these two low 
energy states, hence ?o = (N 
o 
a(1 
-( 
a)ß+)
-1 
. The transport between 
|?e  
and 
|e2 
is prohibited since d2 is much smaller than ?-. 
Proof. Since we desire to have faster transport between the lowest eigen energy 
states, we need to set 
?+ = v1ß+ = -1. (22) 
With the fact that v1 = -?N( 
1 
a(1 
-( 
a)), we need to set 
?e = (N 
e 
a(1 
-( 
a)ß+)
-1 
(23) 
From (20, 21) and ?-d  
in (14), we know d2 is much smaller than 
?-d  
because 
(since m0 
0 
1, a, = m0/N, then aN 
N 
1) 
d2 
?-/  
= 
v2 
v1 

1 
(ß2 
-2 
+ 1) 
= 
1 

) 
aN(ß2 
-2 
+ 1) 
< 
1 
v) 
aN 
. (24)
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 921 
For a given UCPG G, by use of (7–9), we can properly bound ?a as 
?s = 
v3 
v1 
= 
(1 
-1 
a) 
-) 
Pa2 
1 
1-a 
- 
a(1 
-( 
a) 
= 
) 
(1 
-) 
a)(1 
-) 
1 
P 
) 
v) 
a) 
, 0 
=) 
?) < 
) 
1 
-) 
a) 
a) 
. (25) 
2.3 From Existence of Constant Overlap to Optimality Preserving 
For a search space of size N, classical search has the complexity of O(N). Quan-tum 
walk search provides a quadratic speed-up O( 
v( 
N) in comparison to its 
classical counterpart. Note that the complexity is for the number of calls to a 
single step of a search operation. In the remainder of this subsection, we will 
show that the quadratic speed-up (optimality) remains with the ?h chosen based 
on Theorem 2. 
For a given UCPG G, the processing ?ow described in Algorithm 1 can be 
shown as a ?ow chart in Fig. 2. 
Fig. 2. The procedure from systematic dimensionality reduction, basis change, fast 
transport and ?nally optimality preservation. 
By using the theorems and lemma from Subsect. 2.2, Hseek can be expressed 
as (10) in the eigenbasis 
(|?, |e1, |e2) of H(
0) 
. By rewriting (12) using applying 
(5, 6) and Theorem 1, we know 
|e1 
= 
(SV0-? + ß+SV ¯
0) 

) 
1 + ß2 + 
, 
|e2 
= 
(SV0-? + ß-SV ¯
0) 

) 
1 + ß2 
-2 
(26) 
where 
ß± 
=
?± 
v± 
?2 +4 
2 
. 
For a CTQW based on Hseek, we need to decide the value of coupling param-eter 
?f to ensure the optimal performance of the underlying quantum walk is pre-served. 
If the coupling parameter ?h is wrongly chosen, the underlying CTQW
922 C.-F. Chiang 
search might not remain optimal, i.e. its quadratic speed-up might be lost. The 
determination process of correct ?f is shown in Theorem 2. Theorem 3 is an 
extension of Theorem 2 to various cases with respect to the values of variable P 
and variable a. 
Theorem 3. Given a UCPG G = (V0,V1, ··· ,VP) and its adjacency matrix 
Hamiltonian Ha in the 
(|?, |b1, |b2) basis where N = 
P 
i=0 
|Vi|, we can obtain 
the reduced search Hamiltonian Hseek in a new eigenbasis 
(|?, |e1, |e2) by use 
of Theorem 1 for constructing the underlying CTQW. We can then use Theo-rem 
2 to determine the coupling factor ?h = (N 
h 
a(1 
-( 
a)ß+)
-1 
. The chosen ?h 
ensures the underlying CTQW remains optimal. 
Proof. There are two aspects that we need to address to show that the optimality 
O( 
v( 
N) is preserved. One (1) is fast search speed and low escaping speed while 
the other one (2) is the overlap between 
|e1 
and the initial system state 
|s 
(a 
uniform superposition) as it determines how many times we need to repeat the 
experiment. 
The search speed is determined by the dynamics between fast transport non-solution 
|e1 
and solution state 
|?, i.e. 
|e1 ?1 |?. The degenerate eigenspace 
formed by 
|?e  
and 
|e1 
captures the dynamics between those two states. The 
escape speed is from solution 
|?e  
to undesirable non-solution states 
|e2. 
From (21), we know that d1 is responsible for the search speed and d2 is 
responsible for escape speed. In (24)We have shown that d2 is small with respect 
to ?-, the escape speed is small. By use of (20), we know that 
|d1| 
= 
e1|Hseek|?|  
= 
| 
v2ß+ 

 
ß2 + 
+ 1 
| 
= 
| 
-1 

1 
aN(ß2 + 
+ 1) 
| 
(27) 
because v2 = -?)  
? 
N(1 
-( 
a) and ?n = (N 
n 
a(1 
-( 
a)ß+)
-1 
. Hence, we obtain the 
running time 
Trun = 
ph 
2|d1| 
= 
p1 
2 

1 
aN(ß2 + 
+ 1). (28) 
Let us verify that the running time Trun remains optimal in di?erent settings 
of UCPG G when the coupling factor ?h is chosen based on Theorem 2. Brie?y 
speaking, with a ?xed search space of size N, the con?guration of a UCPG G is 
controlled by variable P and variable a. We will discuss di?erent settings based 
on those two variables. 
Case 1: P = 1 
This is a typical complete bipartite graph as seen in [10]. We immediately know 
that ?e = 0 since a1 = 1 
-1 
a1 from (25). This leads to ß+ = 1 from (11). Because 
of that, no matter what value of af is, Trun at (28) holds its quadratic speed-up. 
Case 2: 2 
=: 
P 
=: 
N 
-: 
1 and an 
?n 
1 
N 
By (25), we know that ?e 
?e 
ve 
N 
-e 
1 and by (11), we know ß+ 
?+ 
v+ 
N 
-+ 
1. By 
plugging in the values of af and ß+, Trun at (28) still holds its quadratic speed-up.
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 923 
Case 3: 2 
=: 
P 
=: 
N 
-: 
1 and an 
?n 
1 (such as 
N-1 
N 
) 
By (25), we know that ?e 
?e 
1/ 
v/ 
N 
-/ 
1; by (11), we know 
ß+ 
?+ 
(1/ 
v/ 
N 
-/ 
1) + 
) 
(1/N 
-/ 
1) + 4 
2 
) 
1 (29) 
when N is large. By plugging in the values of af and ß+, Trun at (28) still holds 
its quadratic speed-up. 
Case 4: 2 
=: 
P 
=: 
N 
-: 
1 and an is some constant (non-extreme values): 
Immediately we know ?e and ß+ are some constants that would not a?ect the 
complexity. Hence, Trun at (28) still holds its quadratic speed-up. 
However, the Trun above assumes that we start the search from eigenstate 
|e1 
to 
?nd 
|?, i.e. 
|e1 ?1 |?, which is not the case because we start from 
|s. Hence, 
at Trun the success probability of observing 
|?e  
is the overlap between 
|e1 
and 
|s. The success probability is3 
PO = 
|e1|s|
2 
= 
 
 
 

 
a 
ß2 
+ 
-2 
1 
ß2 
+ 
N 
+ 
v2 
1 
-2 
a2 

2 
1 + 
1 
ß2 
+ 
2 
2 
2 
2 
. (30) 
Therefore 1/PO is the number of times we need to repeat the experiment. We 
need to show that PO is some constant such that it would not a?ect the total 
complexity under the Big O notation. By examining the four cases listed in 
Theorem 3 and the corresponding values of af and ß+ into (30), we know that 
PO remains as some constant that is not exponentially small. 
Since the total runtime is 
Trun 
× 
1 
PO 
(31) 
where Trun holds quadratic speed-up and 
1 
PO 
is some constant that is not large 
(not scaling with N), the complexity still holds the quadratic speed-up. There-fore, 
we know that the chosen ?h = (N 
h 
a(1 
-( 
a)ß+)
-1 
ensures that the under-lying 
CTQW remains optimal. 
3 Noise Error Patterns and Optimality Preserving 
Now we consider how to keep the dimensionality reduced quantum walk search 
optimal by characterizing the noise patterns in the system. The noises can 
be introduced due to the precision limitation and the noisy environment. For 
instance, not all numbers have a perfect binary representation and the approxi-mated 
numbers would cause perturbation. Let matrix 
H a 
be the closest Hamil-tonian 
to Ha that can be prepared by an available quantum system of limited 
precision. In the remaining of this section, we examine the e?ect of (1) systematic 
3 
Simply compute their inner product and we know that 
|s 
= 
|?+ 
v 
m0-1|SV 
0 -?+  + 
v+ 
N-m0|SV 
¯ 
0 
V 
vV 
N 
.
924 C.-F. Chiang 
errors, (2) static errors, and (3) non-static errors on CQTW and the coupling 
factor while the goal is to keep the search by CTQW optimal and the feasibil-ity 
of systematic dimensionality reduction. For simplicity, let us assume we are 
working on only complete graphs for the noise characterization. 
3.1 Systematic Disorder 
If the error is systematic, that is each adjacency matrix entry that connects two 
di?erent sites in 
H a 
su?ers an 
n error in comparison to the original Ha. It is clear 
to see 
H a 
= (1 - 

)Ha. This N by N 
H a 
matrix can be e?ciently reduced to 
(1 
-o 

)Hra smoothly by use of Lanczos algorithm. This can apply to all UCPG 
graphs, including CG, CBG and SG. The new coupling factor 
?w  
scales by a 
factor of 
?f 
1- 
accordingly to keep the search optimal with 
?e  
= (N( 
( 
a(1 
-( 
a)ß+)(1 
-) 

))
-1 
. (32) 
In reality, the distribution of errors is seldom perfectly systematic when envi-ronmental 
noise is considered. When an arbitrary noise distribution is introduced 
into a quantum system of a higher dimension, the task becomes daunting. The 
computation complexity exponentiates due to high dimensionality and the irreg-ularity 
in noise distribution. The irregularity makes the convergence of Lanczos 
algorithm extremely slow or impossible to reduce the dimensionality. The reduc-tion 
process becomes computationally expensive and the reduced dimensionality 
might not be of signi?cant importance. Furthermore, it could be the reduced 
dimensionality still exceeds the implementability of quantum walkers with cur-rent 
quantum technology to solve real life problems on a reasonable scale. For 
this study we aim to examine the possible noise pattern that Lanczos algorithm 
can e?ciently reduce the original adjacency matrix. 
3.2 Static Diagonal Disorder 
Environmental assisted quantum search [11] suggests that naturally occurring 
open quantum system dynamics can be advantageous for a quantum algorithm 
based on quantum walks a?ected by static diagonal errors. The scenario consid-ered 
is the quantum search on a complete graph with diagonal disorder due to 
imperfect Oracle. An imperfect oracle that marks each node of the graph erro-neously 
that non-solution node j is marked with an energy 
j while the solution 
node ?h is marked with an energy -1 + 
?. This is perceived as static disorder 
on the complete graph. With the state 
|s 
=
v 
N 
1 
N-1 
i=0 
|i, the perturbed search 
Hamiltonian with tuning factor ?h = 1/N in the original dimension is 
Hseek = 
-|??| -| |ss| 
+ 
N 
| 
-1 
i=0 

i|ii| 
(33) 
where the static disorder 
i are i.i.d random variables with mean 0 and standard 
deviation sn << 1. By use of degenerate perturbation theory, the reduced search
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 925 
Hamiltonian in the 
{|?, |s?¯} 
basis is 
Hseek = 
s 
-1 + 
?,  -1/ 
v1 
N 
-1/ 
v1 
N -1 
1 
(34) 
where 
|s?¯ 
is the equal superposition of all nodes other than the solution node 
|?. The gap between the ground state and the ?rst excited state of the perturbed 
Hamiltonian is 
?s = 
s 

2 ?2 
+ 4/N (35) 
With ?5 = 1/N, the success probability of the algorithm is 
P?(t) 
) 
sin2 (?t) 
1 + N
2 ?/4 
(36) 
The maximum success probability is obtained when time t = p/?/ 
/ 
O( 
v( 
N). 
From experimental aspect, this algorithm can always choose a ?xed tuning factor, 
and requires only the static disorder on 
|?, then the optimality is preserved 
by calculating the running time solely based on 
?e  that is some variable with 
value between 0 and 1. This shows the only in?uence comes from 
?e  in the 
static disorder situation when the static disorder variables 
i obey the i.i.d. 
distribution, small deviation s. and mean 0 condition. Interested readers can 
refer to this article [11] for technical details. 
3.3 Reducible Non-diagonal Noise 
In this section, we extend the errors to those non-diagonal terms in the adja-cency 
matrix Ha. We will identify the error patterns that the a?ected underlying 
system Hamiltonian by making sure the dimensionality reduction is feasible by 
Lanczos algorithm. We then characterizing those error patterns. 
Suppose non-negative errors 
ij, where ?i, j 
?i 
[1,N] and i= j, occur across 
the original adjacency matrix Ha. To make this scenario not be mixed with the 
static diagonal disorder, eii = 0, let the perturbed adjacency matrix be 
H a 
with 
H aij 
= Haij 
-i 

ij and let the index for ?o be 1 in the system. Our goal is to 
make sure Lanczos algorithm will terminate after two iterations to guarantee 
the desired dimensionality. We are under such a constraint because we want to 
apply the theorems developed in previous sections. 
Without loss of generality, let 
|?t  
be the solution state and the ?rst normal 
basis vector. In the ?rst iteration of Lanczos, we know 
?|?1 
= 0, where 
|?1 
= 
H a|?  
= 
N 
i=2 
(1 
-= 
i1)|i. Let |v2 
= 
|?1 
|?1 
be the second normal basis vector. 
In the second iteration, let 
|?t  2 
= 
H a|v2. For the Lanczos to terminate at this 
stage, the ?rst condition is that 
|?e  2 
must be a linear combination of 
|?e  
and 
|v2 
that 
|?  2 
= 
c1|?|  
+ 
c2|v2. (37) 
When the ?rst condition is met, we have 
?    2|v2 
= c2 since 
|v2 ?2 |?. As 
we proceed with the puri?cation process of Lanczos, we must make sure 
|?2 
= 
|?  2 - ?    2|v2|v2 -2 |?1|?  
= 0 (38)
926 C.-F. Chiang 
such that Lanczos algorithm will stop. This is the second condition. It is thus 
desired to have c1 = 
|?1 
due to (37, 38). The corresponding reduced Hamil-tonian 
Hra 
in 
(|?, |v2) basis with noise is 
Hra= 
s 
0 c1 
c1 c2 
2 
(39) 
This immediately implies 
c1 = 
1 
1 
1 
1 
1 
N 
i=2 
(1 
-= 

i1)2 
(40) 
for the second condition to be satis?ed. Next we need to examine the constraint 
on c1,c2 such that the ?rst condition is satis?ed. We need to show that 
(H a|v2) = 
1 
|?1 
| 
( 
| 
N 
i=2 
(1-
i1)(1-
1i)|?)+ 
= 
N 
k=2 
= 
N 
i=2,i=k 
(1-
ki)(1-
i1)|k 
 
(41) 
is the same as (37). This immediately gives us the requirement for the noise 
pattern that ?k 
?k 
[2,N] 
] 
N 
i=2,i=k 
(1 
- 

ki)(1 
- 

i1) = c2(1 
-2 

k1). (42) 
For c1, by use of (40, 41) and the fact that c1 = 
|?1, we know c2 1 
has 
a 
N 
i=2 
(1 
-= 

i1)
2 
= 
= 
N 
i=2 
(1 
-= 

i1)(1 
-= 

1i). (43) 
From (42), it is clear that we do not have constraints on the error variables 
as we can simply compute the value of c2 based on the given errors. However, 
from (43), we must obey this relation among the error variables. There are many 
feasible scenarios and the simplest scenario is the errors are symmetric. 
4 Discussion 
The notion of invariant subspaces [10] of continuous-time quantum walk 
(CTQW) problems is a powerful technique that simpli?es the analyses of vari-ous 
quantum walk related studies such as the spatial search algorithm, quantum 
transport, and quantum state transfer. In essence, it maps a spatial search algo-rithm 
to a transport problem on a reduced graph. The dimensional reduction is 
purposely constructed to preserve the dynamical evolution of a walker. Hence, 
any quantum walker optimization on a reduced graph guarantees an optimiza-tion 
on the original graph. In this work, we apply this technique to deduce an
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 927 
appropriate coupling factor for the underlying CTQW to run optimally (to keep 
the quadratic speed-up with running time O( 
v( 
N)) for a spatial search. We gen-eralize 
the result in [10] from complete graphs (CG), complete bipartite-graphs 
(CBG) and star graphs (SG) to uniform complete P-partite graphs (UCPG). It 
is clear that UCPG could be non-regular or regular based on the constraints we 
impose. More speci?cally, we (1) derive the formula for the coupling factor ?h 
and (2) show that CTQW constructed based on our choice of coupling factor 
will remain optimal. 
The proof of the optimality is two-fold. The speed of the CTQW is based 
on (1) the transport e?ciency between the two lowest energy eigenstates (one 
is the marked state 
|?e  
and the other state is 
|e1) and (2) the overlap between 
the initial state 
|s 
and the 
|e1 
in the invariant subspace. We showed that the 
transport e?ciency preserved the quadratic speed-up and the overlap is some 
constant that does not scale with the inverse of N. Therefore, the CTQW search 
based on the coupling factor determined by our approach will remain optimal. 
It is clear to see that a high dimension Hamiltonian would not be possible 
to be implemented using near-term quantum technology if we try to encode the 
given original con?guration. However, with our reduction scheme and coupling 
factor determination approach, we can implement this dynamics of the high 
dimension Hamiltonian with very few quantum bits and the quantum walker 
will remain optimal while searching on this reduced system. Furthermore, since 
quantum system is susceptible to the noise, we characterize the noise pattern for 
three types of error distribution: systematic disorder, static diagonal disorder, 
and reducible non-diagonal disorder. For the ?rst two cases, it is apparent that 
no speci?c pattern is required while for the reducible non-diagonal errors, we 
know the pattern must satisfy 43 such that application of our coupling factors 
in the experiment will keep the quantum walker search optimal as the pattern 
guarantees systematic dimensionality reduction to a 3 by 3 Hamiltonian. 
A Appendix A: Reduction Using Lanczos Algorithm 
Algorithm 2. Lanczos Algorithm 
Require: A Hermitian matrix A of size N 
× 
N and optionally a number of iterations 
m. As default, it is m = n but in our case, we desire m = 3. 
Ensure: A orthonormal basis for A 
Let 
|?t  
= v1 then let 
w 1 = Av1, a1 = 
w 
* 
1 v1 and w1 = 
w 1 
- 
a1v1 
For j = 2, ··· ,m, do: 
1. Let ßj = 
||wj-1|| 
2. If ßj= 0, then let vj = wj-1/ßj 
3. Let 
w j = Avj,aj = 
w 
* 
j vj 
4. Let wj = 
w j 
- 
ajvj 
- 
ßjvj-1.
928 C.-F. Chiang 
If an N 
× 
N matrix is reduced to a 3 
× 
3 matrix by Lanczos algorithm, the 
reduced matrix in the 
{v1,v2,v3} 
basis is 
?s 
?s 
a1 ß2 0 
ß2 a2 ß3 
0 ß3 a3 
?3 
?3 
(44) 
By using the Lanczos algorithm on a UCPG con?guration given in Sect. 2, we 
start with v1 = 
|?e  
and A = Ha. Immediately we know that a1 = 0 and that 
leads to ß2 = 
v2 
N 
-2 
m0. At iteration j = 2, we can obtain 
v2 = 
1 
v2 
N 
-2 
m0 
0 
i?V,i?/V0 
|i 
(45) 
w 2 
= 
1 
v 
N 
- 
m0 
((N 
-( m0)|?)  
+ (N 
- 
m0) 
0 
i?V0,i=? 
|i 
+(N 
-( 
m0 
-0 
m1) 
1 
i?V,i?/V0 
|i) (46) 
w2 = 
1 
v2 
N 
-2 
m0 
(N 
-0 
m0) 
0 
i?V0,i=? 
|i 
(47) 
with 
7 
a2 = (N 
-2 
m0 
-0 
m1). At iteration j = 3, we obtain ß3 = 
(N 
-3 
m0)(m0 
-0 
1) with 
v3 = 
1 
v3 
m0 
-0 
1 
0 
i?V0,i=? 
|i 
(48) 
w 3 
= 
v 
m0 
-0 
1 
0 
i?V,i?/V0 
|i 
(49) 
and then we have a3 = 0,w3 = 0. Readers should be reminded that in 5 the 
matrix is written in the 
{v1,v3,v2} 
basis, instead of the 
{v1,v2,v3} 
basis. 
References 
1. Aharonov, Y., Davidovich, L., Zagury, N.: Quantum random walks. Phys. Rev. A 
48(2), 1687 (1993) 
2. Ambainis, A.: Quantum walk algorithm for element distinctness. SIAM J. Comput. 
37(1), 210–239 (2007) 
3. Childs, A.M.: On the relationship between continuous-and discrete-time quantum 
walk. Commun. Math. Phys. 294(2), 581–603 (2010) 
4. Childs, A.M., Cleve, R., Deotto, E., Farhi, E., Gutmann, S., Spielman, D.A.: Expo-nential 
algorithmic speedup by a quantum walk. In: Proceedings of the Thirty-?fth 
Annual ACM Symposium on Theory of Computing, pp. 59–68. ACM (2003) 
5. Childs, A.M., Goldstone, J.: Spatial search by quantum walk. Phys. Rev. A 70(2), 
022314 (2004)
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 929 
6. Childs, A.M., Schulman, L.J., Vazirani, U.V.: Quantum algorithms for hidden non-linear 
structures. In: 48th Annual IEEE Symposium on Foundations of Computer 
Science, FOCS 2007, pp. 395–404. IEEE (2007) 
7. Farhi, E., Goldstone, J., Gutmann, S.: A quantum algorithm for the hamiltonian 
nand tree. arXiv preprint quant-ph/0702144 (2007) 
8. Farhi, E., Gutmann, S.: Quantum computation and decision trees. Phys. Rev. A 
58(2), 915 (1998) 
9. Magniez, F., Santha, M., Szegedy, M.: Quantum algorithms for the triangle prob-lem. 
SIAM J. Comput. 37(2), 413–424 (2007) 
10. Novo, L., Chakraborty, S., Mohseni, M., Neven, H., Omar, Y.: Systematic dimen-sionality 
reduction for quantum walks: optimal spatial search and transport on 
non-regular graphs. Sci. Rep. 5 (2015) 
11. Novo, L., Chakraborty, S., Mohseni, M., Omar, Y.: Environment-assisted analog 
quantum search. arXiv preprint arXiv:1710.02111 (2017) 
12. Yang, Y.-G., Zhao, Q.-Q.: Novel pseudo-random number generator based on quan-tum 
random walks. Sci. Rep. 6, 20362 (2016)
Implementing Dual Marching Square 
Using Visualization Tool Kit (VTK) 
Manu Garg(?) 
and Sudhanshu Kumar Semwal 
Department of Computer Science, University of Colorado, Colorado Springs, USA 
manugarg27@gmail.com, ssemwal@uccs.edu 
Abstract. In the past few decades, volume rendering is perhaps one of the most 
visited research topics in the ?eld of scienti?c visualization. Since volume data- 
sets are large and require considerable computing power to process, the issue of 
supporting real time interaction has received much attention. Extracting a polyg- 
onal mesh from an existing scalar ?eld identi?ed in the volume data has been the 
focus since 1980s. Many algorithms, like Marching Cube, and Marching Square, 
a 2D interpretation, have been developed to extract the polygonal mesh from the 
scalar interpretation of the volume data. Only a few of these techniques claim to 
solve all known existing problems due to concave nature of the surfaces embedded 
inside the volume data. Some extract meshes with too many polygons. Many such 
polygons, with same orientation, could be combined. Sharp features or small 
detail in the underlying surface could be lost due to polygonal approximation. 
Other techniques su?er from topological inconsistencies, self-intersections, inter-cell 
dependencies, and other similar issues. Dual Marching Cubes and its inter- 
pretation in 2D, Dual Marching Squares (DMS) produce smother results in 
comparison to the Marching Square algorithm. In this paper, we implement DMS 
using the VTK pipeline. Renderings of MS and DMS are provided. 
Keywords: Scienti?c visualization · Volume visualization 
Dual marching squares · VTK pipeline 
1 Introduction 
The term visualization, as Ware [1] describes it, means the construction of a visual image 
in the mind (Oxford English Dictionary 1973). Earliest visualizations can be found in 
Chinese cartography by the year of 1137 [2]. Scienti?c visualization uses computer 
graphics and Human Computer Interaction (HCI) techniques to process numerical data 
into two- and three-dimensional visual images. This visualization process includes 
gathering, processing, displaying, analyzing, and interpreting data. Volume visualiza- 
tion is a set of techniques used to extract meaningful information from volumetric data 
using image processing and interactive graphics techniques. Volume datasets can be 
collected by sampling, simulation, or modeling techniques. For example, Computed 
Tomography (CT) can be used to get a sequence of 2D slices or Magnetic Resonance 
Imaging (MRI) data set [1–42]. Other applications of volume rendering are in Compu- 
tational Fluid Dynamics (CFD) [32] and Volume CAD (V-CAD) [33]. Surface 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 930–940, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_69
extraction methods include contour tracking [3], opaque cubes [4], marching cubes [5], 
marching tetrahedra [6], and dividing cubes [7]. Sometimes these techniques can 
generate false positives (spurious surfaces identi?ed as e.g. cancer) or false negatives 
(erroneous holes in surfaces or missing cancerous cells), particularly in the presence of 
small or poorly de?ned features [34]. As geometric information of the objects (voxel) 
is generally not retained, this may in?ict di?culties encountered when rendering discrete 
surfaces [34] especially those obtained from the discretized volume data set. In response 
the problems mentioned above, direct volume rendering techniques were developed that 
attempt to capture the entire 3D data as 2D Image projection. Volume rendering tech- 
niques convey more information than surface rendering methods, but at the cost of 
usually increased algorithm complexity, and consequently increased rendering times. 
One of the most basic volume rendering algorithm is the ray casting [36, 38]. There are 
multiple ways to ?nd iso-contours on a 2D scalar ?eld. One popular way is the Marching 
Squares (MS) algorithm which will be described in more detail later in this paper. The 
Marching Cube (MC) has been incorporated in many ways since its introduction [18]. 
Many of its extensions (e.g., [19]) are analogues for 2D scalar ?elds. One MC extension 
is Dual Marching Cubes (DMC) as proposed in [20]. DMC can produce smoother 
contour than MC for some cases. DMC [21] helps in reducing the MC’s disadvantage 
of creating a lot of triangles even in ?at areas where they are not needed. A new 2D 
analogue of DMC called the Dual Marching Squares (DMS) in explained [16], and is 
also implemented in this paper using Visualization Tool Kit (VTK). 
2 Volume Data 
The medical data is in most cases a set of discrete samples in three-dimensional space [13], 
which produce a volumetric dataset, as in the case of Magnetic Resonance Imaging (MRI) 
scans where set of slices define the 3D volume data set. In most cases, this data is rendered 
straight away with the help of a volumetric renderer, which usually produces a gray-scale 
image of the rendered region. Sometimes a boundary representation of a layer (2D slice) 
should be constructed. This is where an isosurface extraction algorithm can create a polyg- 
onal representation of a certain isolevel of the provided discrete scalar field. Volumes are 
special cases of scalar data: regular 3D grids of scalars, typically interpreted as density 
values. Each data value contained inside a cubic cell or a voxel. Typical scalar volume data 
is composed of a 3-D array of data and three coordinate arrays of the same dimensions. The 
coordinate arrays specify the x, y, and z coordinates for each data point. 
3 Iso-Surface Extraction 
Earliest examples are from 1970 Keppel [3]. Marching Cubes (MC) by Lorensen and Cline 
[8] is probably, one of the most well-known algorithms in Computer Graphics and by far the 
most cited resource in the field. Surface extraction methods could be grouped into three 
classes based on the approach they take. (a) Cellular approaches in Allgower and Gnutz- 
mann [23]; (b) Delaunay-based or Particle approaches, as described in Szeliski and 
Tonnesen [24], Witkin and Heckbert [25], Marching Triangles (MT) technique in [26] and 
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 931
improvement provided in Akkouche et al. [27]; (c) Morphing of the data as in [35], element 
Driven in [27] and Crespin et al. [28]. Shrink Wrap approaches presents a version which 
handles arbitrary geometry. Bottino et al. [31] provide a global surface algorithm. 
Marching Square: Before proceeding it is important that some of the main concepts 
of the foundational techniques are brie?y described. MS is a special case of the MC 
algorithm, restricted to two-dimensional space. Therefore, it is used for the extraction 
of isocurves and isolines. This method can be used to give a piecewise-linear approxi- 
mation to a two-dimensional object based on 4 vertices. A total of 16 con?gurations 
based on four vertices can be described by the following four situations (Fig. 1). 
Fig. 1. The 4 unique con?gurations of the Marching Squares algorithm, which are necessary to 
reproduce all others. 
Comments on 3D Version of Marching Squares Called Marching Cube: In 1987 
Lorensen and Cline [8] present an algorithm that creates a triangular mesh for medical 
data. Known as “marching cubes” due to the way it “marches” from one to the next, the 
algorithm is considered to be the basic method for surface rendering in applications. 
They use Marching Cubes (MC15) to process computer tomography slices in scan line 
order, while maintaining inter-slice connectivity, containing 15 con?gurations of 
possible surface intersections. Nielson et al. [9] found that MC15 has no topological 
guarantees for consistency and produces visual hull surfaces containing small holes or 
cracks due to certain voxel face ambiguities. They proposed a modi?cation to MC15 
that implements face tests to resolve the ambiguities. This modi?cation still does not 
guarantee the correct topology either. In 1995, Chernyaev [10] showed that there are 33 
topologically intersections and not 15, Chernyaev’s algorithm is referred to as MC33. 
Montani et al. also noted the topological inconsistency, computational e?ciency and 
excessive data fragmentation as disadvantage of MC15. They propose a method to 
minimize the number of triangular patches speci?ed in the marching cube surface lookup 
table, reducing the amount of data output and improving the computational e?ciency. 
In a paper by Lewiner et al. [11], an e?cient and robust implementation Chernyaev’s 
MC33 algorithm is described. Tarini et al. [12] developed a fast and e?cient version of 
the marching cubes algorithm, called marching intersections, to implement a volumetric 
based visual hull extraction technique. 
Dual Marching Cube: The DMC [20] algorithm, bases its structure on the MC algo- 
rithm, but improves it in many ways. In DMC, the dual of an octree is tessellated via 
the standard marching cubes method. This algorithm eliminates or reduces poorly 
shaped triangles and irregular or crooked specular highlights. The DMC always gener- 
ates topological manifold surfaces. Nielson uni?es MC surface fragments to polygonal 
patches where the vertices of these patches are located on the lattice edges. Since each 
lattice edge is adjacent to four cells, each patch vertex is touched by four patches. The 
932 M. Garg and S. K. Semwal
dual surface is now de?ned (1) by replacing each patch by a vertex; and (2) by replacing 
each patch vertex by a quadrilateral face. In contrast to DC, this approach results in a 
classi?cation of 23 cell con?gurations that are dual to the 23 MC con?gurations required 
for extracting topological manifolds. Each con?guration may create up to 4 vertices and 
the connectivity is well de?ned via the lattice edges. More precisely, when a lattice edge 
intersects the isosurface, this edge is associated with four vertices forming a quadrilateral 
surface fragment. 
Dual Marching Square: DMS is the 2D analogue of DMC and is an elegant extension 
of 2D MC algorithm. Dual Marching Squares can be considered as a post-processing of 
the segments produced by Marching Squares. DMS appears to improve the curvedness 
for at least the objects with smoothly curved boundaries as shown in Fig. 2 [42] below. 
It does so by considering the dual graph of the quadtree which is one of the basic data 
structure in 2D graphics. Quadtree is hierarchical. In a quad tree a 2D region is recur- 
sively divided into four quadrants. Each quadrant is either a leaf cell or subdivided 
further. Quad Tree contains one type of hierarchical node and three terminal nodes (leaf, 
empty, full). After implementing the quadtree, a dual-grid and marching square over the 
grid can be created. This is an example where the DMS is working in higher resolution 
than MS. So, we have ?ner data and hence better surface results. 
Fig. 2. An interpreted curve achieved by Dual Marching Square (a) compared to Marching 
Square (b) (Drawn using draw.io). 
4 Implementation and Results 
DMS algorithm is implemented using the Visualization Toolkit (VTK). DMS is the 2D 
analogue of Dual Marching Cubes. Its contour is the dual of the contour produced by 
Marching Squares. The test data “h” consist of boundaries comprised largely of smooth 
curve. The ?rst part of DMS is generation of quad tree (Fig. 3). This quadtree is respon- 
sible for having less triangles in plane areas and more in curved. The recursive quad 
trees generated are shown below. 
After the quad trees are generated, the cells are merged whenever a cell’s distance 
?eld is entirely determined by a single line, using the concept of Adaptively Sampled 
Distance Fields (ADFs) [40]. Single ?at edges are combined to create larger leaf cells 
(Fig. 4). The next step is to derive dual-grid from the quadtree where each vertex of 
quadtree is placed at the c enter of its square (Fig. 6). This grid is topologically dual to 
the quadtree. The vertices are generated by the process de?ned in DMC [21]. According 
to DMC the vertices of dual grid from which surface will be extracted are generated 
using feature isolation. For each vertex of the quad tree, a dual grid cell whose vertices 
are the feature vertices inside of each square of quadtree, will be created. This technique 
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 933
helps to better interpret sharp features. Figure 7 shows the topology of the example quad 
tree where each vertex is placed in the center of a square. For the test data the vertices 
generated for each green square are shown in Fig. 5. 
Fig. 4. Merging of cells to form a larger leaf (green color) after ADFs. 
Fig. 5. Quad tree created for test shape “h” after ADFs. Leaf cells (Green Color), Empty cells 
(Grey Color), and Full cells (Back Color). 
Fig. 3. First ?ve quad tree level for test data “h”. 
934 M. Garg and S. K. Semwal
Fig. 6. Dual grid (Black color) over a primary quadtree (Grey color). 
Fig. 7. Vertices (Red color) of the test shape “h” for Dual Marching Square. Leaf cells (Green 
Color), Empty cells (Grey Color), and Full cells (Back Color). 
The feature extraction is based on the local information ?eld and its gradients as 
given by Kobbelt [41]. This can be achieved by ?nding the position and normal for all 
the points along the cells side that intersect with the isocontour. Then perform least-squares 
?t to ?nd the feature position (vertex). Once the vertices are generated the next 
step is to join them together to form the contour. This can be done by marching using 
three functions similar to ones de?ned in DMC [21]. (a) faceProc: It is called on one 
cell. It returns nothing, if the cell is a leaf. For the root cell, it calls itself on each subtree, 
on each horizontal pair of cells it calls hEdgeProce and on each vertical pair of cells it 
calls vEdgeProc. (b) vEdgeProc: It is called on vertical pair of cells. It the cells are both 
leafs, it creates contour between two cells, else calls itself again. (c) hEdgeProc: It is 
same as vEdgeProc, but works on horizontal pair of cells. Here are the results generated 
by MS and DMS (Fig. 8). 
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 935
Fig. 8. Left “h” generated using Marching Square. Right “h” generated using Dual Marching 
square. 
5 Results and Discussions 
Dual Marching Square Algorithms. Here 3D isosurface is constructed by assembling 
the set of 2D contours located on a set of parallel slices which are displayed in Figs. 9, 
10, 11 and 12. The visible human dataset is dense, i.e., the slice planes are close to each 
other and don’t exhibit too-sharp variations. So, the 3D isosurface is constructed by 
connecting points on an isoline with the closest points on isoline from previous and next 
slice [42]. Following Fig. 9, 10, 11 and 12 are the results obtained by using Marching 
Square and Dual Marching Square interpretations [42]. Times in seconds are equivalent, 
quality of images are di?erent with as DMS produces more geometry to capture the 
curvedness and corners than MS. The implementation resulted by using VTK library 
functions for dual marching square. In future work, we hope to further study other 
aspects of DMS behavior. Some of the optimization suggestion like adaptive sampling, 
similar to the ADFs [37], and another optimization is the GPU implementation. Lastly, 
we also need measures for generated surface quality for the given volume data, so that 
we can compare the surface generated by DMS and DMC and provide more/better 
quantitative analysis of our results [42]. Because 2D contours on a slice are used for 3D 
Fig. 9. Left Female head_Front Marching Square (duration 0:00:00.246355) Right Female 
head_Front Dual Marching Square (duration 0:00:00.247965). 
936 M. Garg and S. K. Semwal
rendering, side e?ects are created (Fig. 9, 10, 11 and 12), which needs to be further 
explored in future. 
Fig. 10. Left Female Head_side Marching Square (duration 0:00:00.246355) Right Female 
Head_side Dual Marching Square (duration 0:00:00.247965). 
Fig. 11. Left Female Eye_ Front Marching Square (duration 0:00:00. 213779), Right Female 
Eye_Front Dual Marching Square (duration 0:00:00. 223355). 
Fig. 12. Left Female_Ear_Front Marching Square (duration 0:00:00.244979), Right 
Female_Ear_Front Dual Marching Square(duration 0:00:00. 253785). 
Acknowledgments. The authors are indebted to VTK Community – several existing functions 
from VTK toolbox were used to interpret the existing medical data to generate images and their 
renderings shown in this paper (Fig. 9, 10, 11 and 12). 
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 937
References 
1. Ware, C.: Information Visualization: Perception for Design, 2nd edn. Morgan Kaufmann 
(2004) 
2. Collins, B.M.: Data visualization - has it all been seen before? In Earnshaw, R.A., Watson, 
D. (eds.) Animation and Scienti?c Visualization – Tools and Applications, 1st edn., pp. 3– 
28. Academic Press, London (1993). Chapter 1 
3. Keppel, E.: Approximating complex surfaces by triangulation of contour lines. IBM J. Res. 
Dev. 19(1), 2–11 (1975) 
4. Herman, G.T., Liu, H.K.: Three-dimensional display of human organs from computed 
tomograms. Comput. Graph. Image Process. 9(1), 1–21 (1979) 
5. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction 
algorithm. Comput. Graph. 21(4), 163–168 (1987) 
6. Shirley, P., Tuchman, A.: A polygonal approximation to direct scalar volume rendering. 
Comput. Graph. 24(5), 63–70 (1990) 
7. Cline, H.E., Lorensen, W.E., Ludke, S., Crawford, C.R., Teeter, B.C.: Two algorithms for 
the reconstruction of surfaces from tomograms. Med. Phys. 15(3), 320–327 (1988) 
8. Lorensen, W.E., Cline, H.E.: Computer graphics. Marching Cubes: High Resolut. 3D Surf. 
Reconstr. 21(4), 163-166 (1987) 
9. Nielson, G.M., Hamann, B.: The asymptotic decider: resolving the ambiguity in marching 
cubes. In: Proceedings of Visualization 1991, pp. 29–38, October 1991 
10. Chemyaev, E.V.: Marching cubes 33: construction of topologically correct isosurfaces 
Technical report, CN 95–17, CERN (1995) 
11. Lewiner, T., Lopes, H., Viera, A.W., Tavares, G.: E?cient implementation of marching cubes 
cases with topological guarantees. J. Graph. Tools 8(2), 1–15 (2003) 
12. Tarini, M., Callieri, M., Montani, C., Rocchini, C., Olsson, K., Persson, T.: Marching 
intersections: an e?cient approach to shape-from-silhouette. In: 7th International Fall 
Workshop on Vision Modeling, and Visualization, November 2002 
13. Lichtenbelt, B., Crane, R., Naqvi, S.: Introduction to Volume Rendering. Prentice-Hall Inc, 
Upper Saddle River (1998) 
14. Elvins, T.T.: A survey of algorithms for volume visualization. Comput. Graph. 26(3), 194– 
201 (1992) 
15. Brodlie, K., Wood, J.: Recent advances in volume visualization. Comput. Graph. Forum 
20(2), 125–148 (2001) 
16. Gong, S., Newman, T.S.: Dual marching squares: description and analysis. In: 2016 IEEE 
Southwest Symposium on Image Analysis and Interpretation (SSIAI), Santa Fe, NM, pp. 53– 
56 (2016) 
17. Freeman, H.: Computer processing of line-drawing images. ACM Comput. Surv. 6(1), 57– 
97 (1974) 
18. Newman, T., Yi, H.: A survey of the marching cubes algorithm. Comput. Graph. 30(5), 854– 
879 (2006) 
19. Treece, G., Prager, P., Gee, A.: Regularised marching tetrahedra: Improved iso-surface 
extraction. Comput. Graph. 23(4), 583–598 (1999) 
20. Nielson, G.: Dual marching cubes. Proc. Vis. 04, 489–496 (2004) 
21. Schaefer, S., Warren, J.: Dual marching cubes: primal contouring of dual grids. In: 12th 
Paci?c Conference on Computer Graphics and Applications, PG 2004. Proceedings, pp. 70– 
76 (2004) 
22. Bloomenthal, J., Wyvill, B. (eds.): Introduction to Implicit Surfaces. Morgan Kaufmann 
Publishers Inc., San Francisco (1997) 
938 M. Garg and S. K. Semwal
23. Allgower, E.L., Gnutzmann, S.: Simplicial pivoting for mesh generation of implicitly de?ned 
surfaces. Comput. Aided Geom. Des. 8(4), 305–325 (1991) 
24. Szeliski, R., Tonnesen, D.: Surface modeling with oriented particle systems. In: Proceedings 
of the 19th Annual Conference on Computer Graphics and Interactive Techniques, 
SIGGRAPH 1992, pp. 185–194. ACM, New York (1992) 
25. Witkin, A.P., Heckbert, P.S.: Using particles to sample and control implicit surfaces. In: 
Proceedings of the 21st Annual Conference on Computer Graphics and Interactive 
Techniques, SIGGRAPH 1994, pp. 269–277. ACM, New York (1994) 
26. Hilton, A., Stoddart, A.J., Illingworth, J., Windeatt, T.: Marching triangles: range image 
fusion for complex object modelling. ICIP 2, 381–384 (1996) 
27. Akkouche, S., Galin, E., Centrale, E.: Adaptive implicit surface polygonization using 
marching triangles. Comput. Graph. Forum 20, 67–80 (2001) 
28. Desbrun, M., Tsingos, N., Paule Gascuel M.: Adaptive sampling of implicit surfaces for 
interactive modeling and animation. Comput. Graph. Forum, 171–185 (1995) 
29. Crespin, B., Guitton, P., Schlick, C.: E?cient and accurate tessellation of implicit sweep 
objects. In: Constructive Solid Geometry, pp. 49–63 (1998) 
30. Wyvill, G., Kunii, T.L., Shirai, Y.: Space division for ray tracing in CSG. IEEE Comput. 
Graph. Appl. 6(4), 28–34 (1986) 
31. Bottino, A., Nuij, W., Overveld, K.V.: How to shrinkwrap through a critical point: an 
algorithm for the adaptive triangulation of isosurfaces with arbitrary topology. In: 
Proceedings Implicit Surfaces 1996, pp. 53–72 (1996) 
32. Ebert, D.S., Yagel, R., Scott, J., Kurzion, Y.: Volume rendering methods for computational 
?uid dynamics visualization. In: IEEE Conference on Visualization 1994, Proceedings, 
Washington, DC, pp. 232–239, CP26 (1994) 
33. Kase, K., Teshima, Y., Usami, S., Ohmori, H., Teodosiu, C., Makinouchi, A.: Volume CAD. 
In: Proceedings of the 2003 Eurographics/IEEE TVCG Workshop on Volume Graphics, 
Tokyo, Japan, 07–08 July 2003 
34. Kaufman, A., Cohen, D., Yagel, R.: Volume graphics. IEEE Comput. 26(7), pp. 51–64, July 
1993 
35. Semwal, S.K., Chandrashekher, K.: 3D morphing for volume data. In: The 18th Conference 
in Central Europe, on Computer Graphics, Visualization, and Computer Vision, WSCG 2005 
Conference, pp. 1–7, January 2005 
36. Buchanan, D.L., Semwal, S.K.: A new front to back composition technique for volume 
rendering. In: Chua, T.S., Kunii, T.L. (eds.) CG International 1990. Springer, Tokyo (1990) 
37. Frisken, S.F., Perry, R.N., Rockwood, A.P., Jones, T.R.: Adaptively sampled distance ?elds: 
a general representation of shape for computer graphics. In: Proceedings of the 27th Annual 
Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2000, vol. 78, 
pp. 249–254. ACM Press/Addison-Wesley Publishing Co., New York (2000) 
38. Swann, P.G., Semwal, S.K.: Volume rendering of ?ow-visualization point data. In: Nielson, 
G.M., Rosenblum, L. (eds.) Proceedings of the 2nd Conference on Visualization 1991 (VIS 
1991), pp. 25–32. IEEE Computer Society Press, Los Alamitos (1991) 
39. DICOM Structured Reporting, Dr. David A. Clunie 
40. Frisken, S.F., Perry, R.N., Rockwood, A.P., Jones, T.R.: Adaptively sampled distance ?elds: 
a general representation of shape for computer graphics. In: Proceedings of the 27th Annual 
Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 2000), pp. 249– 
254. ACM Press/Addison-Wesley Publishing Co., New York (2000) 
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 939
41. Kobbelt, L.P., Botsch, M., Schwanecke, U., Seidel, H.-P.: Feature sensitive surface extraction 
from volume data. In: Proceedings of the 28th Annual Conference on Computer Graphics 
and Interactive Techniques, (SIGGRAPH 2001), pp. 57–66. ACM, New York (2001) 
42. Garg, M., Squares, D.M.: Implementation and analysis using VTK. MS thesis, Supervisor: 
Sudhanshu Kumar Semwal, Department of Computer Science, University of Colorado, 
Colorado Springs, pp. 1–63 (2017) 
940 M. Garg and S. K. Semwal
Procedural 3D Tile Generation for Level Design 
Anthony Medendorp(?) 
and Sudhanshu Kumar Semwal 
Department of Computer Science, University of Colorado, Colorado Springs, CO, USA 
anthonymedendorp1@gmail.com, ssemwal@uccs.edu 
Abstract. Procedural level generation in game design can reduce the resources 
needed in various aspects of game development while still providing a robust and 
re-playable game experience for the player. Procedural level design is most 
frequently seen in rogue-like adventure game and should not be limited to just a 
single genre or design style, because of the great potential of this approach. 
Through research into procedural generation via programming, and practical 
experience as a 3D artist, these two contrasting and historically separated sides 
of game design can be united to create a more coherent and practical approach. 
In this paper, we will focus on implementing and addressing technical challenges 
of 3D tile generation and their movement which is the main contribution of our 
work. 
Keywords: Unity3d™ · Platformer · Rouge like · Game design · Puzzle 
Procedural generation 
1 Introduction 
Platform adventure games provide the player with a sense of wonder and excitement as 
they explore new lands and overcome challenging obstacles on their quests. Banjo-
Kazooie is an example of a platformer, where the player plays as a bear and bird 
exploring various lands while collecting an assortment of items. The downside of plat- 
form adventure games is that they don’t o?er the player any variety upon multiple play- 
throughs. In today’s gaming world re-playability has become a major selling point for 
games. Roguelike games such as Binding of Isaac, Spelunky, and Darkest Dungeon o?er 
the player more value for the money as each playthrough of the game is di?erent keeping 
the player coming back for more. Our goal was to design procedural process that merges 
the best of those genres and o?ers players a style of game that had yet to really be 
developed or even de?ned: one that provides the adventure of a platformer, the challenge 
of puzzle-solving, the excitement of exploration, and the continued playability of a 
dungeon crawler. 
2 Literature Review 
Green (2016) takes a coding-driven approach to procedural content generation, which 
is now the standard for most games that o?er a high degree of re-playability [1]. The 
major downside of this approach is that the art assets it utilizes are necessarily very 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 941–949, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_70
simple so that they will ?t nicely into a grid in all possible combinations. These are often 
one-dimensional rather than multilayered grids, and they are therefore best suited for 
2D games. More recently, however, game developers have begun to adapt procedural 
level generation for 3D environments, resulting in games like No Man’s Sky that can 
generate an entire universe that houses over eighteen quintillion possible planets to 
explore, each with its own geography and vegetation. PCG (procedural content gener- 
ation) is generally explained either from a coding perspective without much attention 
to the game’s visual aesthetics or, conversely, from an artistic perspective of modular 
design, overlooking any in-depth consideration of the programming methodology. For 
example, Tor Frick (2011) o?ers a comprehensive exploration of modular texture design 
within Unreal Engine, but the focus is on the artwork without any discussion of the 
coding involved in procedural design [4]. There are plenty of examples of artists that 
employ modular design, which then could be applied to a procedural approach to level 
design, but thus far, the research does not go beyond artistic considerations to explain 
the programming side. 2D tile sliding puzzles are common, but 3D platform games 
employing these kinds of puzzles do not actually currently exist because the tile sliding 
game mechanic has yet to be implemented in a 3D platform game. The oldest example 
of a sliding puzzle is the ?fteen-puzzle invented in 1880 by Noyes Chapman (Fig. 1) [5]. 
From the 1950s through the 1980s, sliding puzzles evolved and began employing letters 
to form words. These sorts of puzzles have several possible solutions, which add another 
dimension to the gameplay. In an online resource, Andrew Chapple analyzes the possible 
permutations available to the 15-puzzle setting a solid foundation for solvability within 
the 15-puzzle. 
Fig. 1. In game screenshot showing player, health gauge, pickup count, and possible trap/coin 
pickup. 
Platformer Games: A platformer is a game in which the central challenge is to navigate 
a character across a series of disconnected platforms, usually by jumping. In this project, 
we focused of two subtypes of platform games. These are puzzle platformers and plat- 
form-adventure games. In puzzle platformers, some sort of puzzle mechanics is typically 
added to the platform navigation. The platform-adventure genre also incorporates 
942 A. Medendorp and S. K. Semwal
challenges from some elements of action-adventure games, such as open exploration, 
inventory management, and an ability system. Super Mario 64 and Banjo-Kazooie for 
the Nintendo 64 are examples of 3D platform adventure games. They dominated an 
entire generation of gaming consoles but experienced a signi?cant decline in popularity 
in the new millennium with the advent of the two most recent generations of game 
consoles such as the Xbox 360 and PlayStation 3. Nintendo remains one of the only 
major game developer that continued to pursue the market for these games. 
The demand for platform-adventure games remains strong, however, and may even 
be undergoing a resurgence, as evidenced by the nearly 2.5 million dollars raised through 
crowdfunding on Kickstarter for the development of Yooka-Laylee, a spiritual successor 
to games like Banjo-Kazooie and Donkey Kong 64. The downside of this ?uctuation in 
consumer interest is that the genre has su?ered, especially in terms of the stagnation of 
ideas. Instead of reinventing the genre as so many supporters hoped it would, Yooka-
Laylee comes across as merely a nostalgic clone, in our opinion. 
By contrast, roguelike games are a subgenre of role-playing games (RPGs) that are 
often characterized as ‘dungeon crawlers’. In a dungeon crawler, players navigate a 
maze-like environment, typically ?ghting through waves of enemies and collecting 
treasure along the way. Due to the simplicity of the dungeon crawler structure, this 
mechanic lends itself well to procedurally generated games. 
3 Motivation 
By focusing on the few key features of RPGs that are relevant to this project, we can 
o?er a simpli?ed de?nition of the roguelike subgenre, which includes two key charac- 
teristics: (1) a dungeon crawler structure that consists of (2) procedurally generated 
levels. Roguelikes are mostly 2D or 2.5D games, with 3D roguelikes being much less 
common. Platformers generally have a well-de?ned story for the player but lack re-playability, 
whereas roguelikes o?er re-playability but often become repetitive because 
they lack a strong story to drive them. The goal and focus of our work was to take the 
key traits of both genres and merge them to set the ground work for a platform adventure 
game that o?ers the player maximum re-playability while minimizing the amount of 3D 
and 2D content to be created. 
4 Methodology 
Unity3D™ [2–5] was used to create the framework for a third-person roguelike puzzle 
platformer game using the principles of procedural generation. The initial goal was to 
create a game that randomly generates the levels based on simple parameters which the 
program follows. By utilizing Unity3D™’s 3D assets and tiling textures, we created a 
minima game experience that would o?er a large amount of variety. For typical video 
games, a level designer hand-places all the assets, whereas we envisioned a program 
that could place the assets at the beginning of each stage itself automatically. 
Procedural 3D Tile Generation for Level Design 943
Concept/Story: The core game concept for this project involves a maze-style sewer 
system that players traverse from beginning to end via a three-dimensional path of 
movable Blocks while also avoiding traps, hazards, and rising sewage levels. Our story 
centers around a rat (the player) on a quest to save his best friend who was kidnapped 
by an unknown group of enemy rats, forcing him out of his complacent life of living in 
a quiet corner of the sewer, spending his days consuming cheese. The quest to rescue a 
friend is the primary motivation for the narrative, but the sense of urgency is heightened 
when it is revealed that the kidnappers have also poisoned the sewer system and activated 
a series of pumps, causing toxic water to steadily rise. As the player ?ees this threat and 
searches for the kidnapped friend, they are guided by their cheese dealer, who o?ers aid 
in exchange for assistance in gathering his stolen cheese. The game’s tone is adult-oriented, 
but the story elements embrace the nostalgia of children’s games, targeting 
adults that grew up with early 3d platform games. In true platformer fashion, the player 
periodically encounters bosses to battle at prede?ned intervals as they progress through 
the levels, utilizing the tile sliding mechanic for combat. After each boss’ ?ght, it is 
revealed that the kidnapped friend is in another sewer, guarded by a yet more di?cult 
boss, and the player must continue climbing up through the sewer system, encountering 
new challenges along the way to keep the gameplay fresh. Our story line maps to rules 
that must be followed and ?xed obstacles to overcome along the journey. The starting 
and ending positions for each level are permanently situated diagonally from each other, 
some platforms are immovable, the passageways to be navigated expand and shrink, 
and the poisonous sewage rises faster as the game progresses. Technical challenges are 
explained next. 
Procedural Design Concepts: By utilizing tiling textures and modular 3D assets the 
total amount of unique assets could be minimized. A procedural design was laid out on 
paper breaking the actual game into Blocks to represent the slide-able tiles. Each Block 
was broken into smaller pieces that allowed them to ?t together with a variety of pieces. 
Block size was established as 2048 cubed units, with the inside broken into 3 layers high 
and are 3 by 3. These sections allowed the use of snapping on a grid within 3ds max 
which improved the time it took to ?esh out the layout. Procedural level design o?ers 
the bene?t of a large amount of variety with minimal assets, making our implementation 
ideal for a single or small team project where time and resources might be limited. Since 
platformer games have yet to take on a procedural level design and focus the gameplay 
around it, this concept provided a good opportunity for study as well (Fig. 1). 
The ‘Block’ Algorithm: As mentioned earlier, a Block is a 2048 cubed units section 
of the puzzle board comprised of interlocking 3d assets that purposely ?t together. A 
Block is a combination of various prefabs that together make up a moveable section of 
the sewer, and the player can slide these Blocks around the board to make traversal to 
the exit easier. Each Block contains: (a) One of ?ve (not too many not too little) possible 
?oor pieces that are chosen randomly from an array. (b) One of ?ve possible sca?olding 
pieces that are chosen randomly from an array. (c) One of two possible ladder pieces 
that are mirrors of each other have a de?nable chance of spawning in a block. 
Additional elements that are not part of movable Blocks are also generated as stationary 
items on each level map (Fig. 2): (a) The Entrance prefab is a tube leading back down the 
944 A. Medendorp and S. K. Semwal
ladder to the previous level, acting as the player spawn point. Although the player does not 
have the ability to go backwards to previous levels, the entrance serves as the starting point 
for each new level. (b) The Exit prefab is a rising ladder platform that sends the player to 
the next level. The height of the exit is determined by the water level, rising and falling 
along with it so that the player can access the Exit no matter the height of the sewage, if 
they reach it before the level has been fully submerged. (c) The Exterior Box is the outer 
boundary of the level map that blocks the player from traveling outside of the predefined 
play area. Since the size of the map can change from level to level, the Exterior Box also 
changes to correspond to the size of each level map. (d) Boarder Caps are transition pieces 
placed at each Block location, but they do not move with the Blocks. Instead, these 
stationary pieces help hide seams between Blocks, preventing the appearance of bleeding. 
(e) Sewage spawns to the size of the game board, starting at a height of 256 and rising at a 
rate determined by the size of the map and current level. For instance, the water level rises 
faster on smaller boards than on larger boards, and at higher levels, say level 5, sewage 
rises at a faster rate than it did at level 1. 
Fig. 2. Rough Outline of 3D asset concept. Green is the lower platforms, red is the sca?olding, 
purple is the starting and ending location, orange is player scale. 
Gameplay Algorithm: The sewer maze is comprised of three-dimensional Blocks that 
are chosen procedurally by the game for each level and placed in a two-dimensional 
grid. There are ?ve variations of base pieces that are rotated and placed in the grid of 
the game board. There is also an upper sca?old piece that is added to the base pieces to 
add another layer to the 3D playable area. Additional traps and hazards are randomly 
added to the base pieces to add even more variation. Utilizing third-person camera 
controls, a player must navigate through the sewer by climbing ladders, jumping across 
small gaps, and collecting cheese along the way. Starting from the player spawn point 
at the bottom left of the grid, the player heads to the exit at the top right by traversing 
the board, avoiding hazards. The player can use the DPad to slide sections of the sewer 
around in a similar fashion to a 2d sliding puzzle. The start and exit are ?xed at the (N^N 
– 2 N + 2) and (2 N – 1) (These numbers represent the tiles diagonal to the bottom left 
corner and top right corner) tile locations on the board, and the tile the rat is currently 
Procedural 3D Tile Generation for Level Design 945
standing on cannot be slid, which prevents users from simply sliding to the exit on a 
Block without moving the rat itself. To add another challenge to the gameplay experi- 
ence, as time passes, the sewage level rises, forcing the player to higher ground and 
eventually killing the player if they do not solve the puzzle quickly enough. Switches 
can be pulled to lower the sewage level a certain amount that is based on the size of the 
board. If the sewage rises too high, the player is forced to climb a ladder to the higher 
sca?olding to navigate the stage. 
Establish How to Generate an Always Solvable Puzzle: This goal required a large 
amount of time and deliberation as to how to best handle this complex task. Possible 
solutions considered included generating a pre-solved board and then shu?ing the 
pieces to an unsolved state. This would ensure easy calculation of sewage rising speed 
and di?culty, but generating a board that is random is our focus. This meant that path 
?nding logic or hardcoded logic would have to be added to the board generator. 
An alternative was to create one or more partially built sewers that would be solvable 
(Fig. 3) and then ?ll in the rest of the board with random pieces before shu?ing the 
board. This would add more randomness to the board but still ensure that the board is 
always solvable, and di?cultly and sewage rising rate could still be easily determined. 
The downside is it wouldn’t be as random as originally intended, and there would be a 
higher chance that the boards would seem less unique. 
Fig. 3. Top down view of an example board without Jumble. Red lines outline the Blocks that 
o?er the player an always solvable solution. 
The ?nal idea and was to establish what makes a board unsolvable and correct those 
issues. The sewer isn’t like a 2D sliding picture puzzle where the player needs to create 
946 A. Medendorp and S. K. Semwal
the entire picture to win; it is closer to the 2d word slider puzzles where there are multiple 
solutions that are achievable. By ensuring that the player has as many paths as possible 
with open spaces to slide each Block, we can reduce the chances of a player getting 
stuck. Additionally, by making sure that the variety of possible pieces is spawned more 
evenly on the map, we prevent the chances of unintentionally generating levels with 
inevitable dead ends. 
This still left the possibility that the player might get stuck at the spawn point if a 
piece was generated that prevented the player from being able to move from the level 
entrance point. To address this, the spawn and exit points were moved away from the 
corners of the play area in a column and row, forcing each map to be at a minimum 5 
by 5 Blocks so that the spawn and exit are not directly diagonal of each other. 
Increasing the frequency of ladder spawns in the map also increases the number of 
possible traversable pathways. The downside of this approach is that we can no longer 
easily generate the sewage rising rate and di?culty of the map. Instead, the sewage rising 
rate is based on the size of the board and current level. Board di?culty has not yet been 
addressed, but the plan is that di?culty will be focused on the presence of enemies and 
traps, the player’s health bar, and the cost of items available for purchase from the shop. 
Develop the controls for moving the Blocks correctly within a grid and block out 
rough geometry to represent the pieces once the camera and player controls are in place. 
We created a tile movement script to be applied to each Block and, within the update 
function, allow a Block to swap places with an adjacent EmptyTile. This approach led 
to an unforeseen issue wherein multiple Blocks were able to slide during a single button 
press. So we needed to modify the implementation so that Blocks are now stored within 
an array and sorted accordingly, swapping only with the next Block. As a result, Blocks 
now move individually, with each button press controlling movement for just one Block 
at a time as originally intended. 
Create the method by which the program will generate the game board. Generating the 
game board is based on pulling together random pieces that make up a single Block and 
placing them into an array that the tile movement wrapper handles. Extra checks are done 
to insure the same piece is not spawning more frequently than others in the map. 
Secondary Goals Beyond these initial tasks, the next steps were to add core game 
framework. The core game framework was roughed out so that the player can navigate 
through an entire game from start menu to game to pause menu cycling thru various 
levels and place holder boss stages. Placeholders were incorporated within the board 
generator to represent traps, currency and switches despite not having functionality 
added to these assets yet. 
5 Key Features Implementation 
Each board is generated using the BoardManager script. When SetupScene is called 
from the GameManager calls AutomaticGrowth, which checks if automatic growth of 
levels is on or not and responds accordingly, then calls BuildLevel(). The BuildLevel() 
function runs through all the various functions to build a level which include in this order 
BoardSetup(), AddBoardComponents(), PopulateBoardItems(), CreateOuter(), and 
Procedural 3D Tile Generation for Level Design 947
Jumble(). BoardSetup() creates a new gameobject called Board to hold all the Blocks. 
Within the nested for loop, for the rows and columns size, ?rst a random ?oorTiles 
gameobject is set toInstantiate and replaced with speci?c ?oorTiles at certain board 
locations. Between the Entrance and Exit for instance the BoardSetup() spawns only 
plus-sign ?oor tiles which are assigned to ?oorTiles[0]. This ?oor tile insures there is 
always a path between the start and ?nish generated. ?oorTiles[0] is removed from the 
possible random other tiles in order to reduce the amount of them overall on the board. 
At the bottom right corner of the board only EmptyTile is spawned instead of a ?oor 
tile. We do some special checks for entrance and exits and spawn entrances and exits in 
the correct board locations one location diagonal of the bottom left corner and top right 
corner. Finally if the tile we are spawning is not tagged as an “EmptyTile” we Genera- 
teUpperLayer() along with Ladders and Switch spawn locations. GenerateUpperLayer() 
creates another toInstantiate for Sca?old, Ladder, and Switches and instances them into 
each Block of the board. 
AddBoardComponents() is responsible for adding the TileMovementWrapper.cs to 
the Board and initializing the size of the board for the TileMovementWrapper.cs. It also 
adds the DPadButtons.cs script that makes sure each DPadButton.cs input is a single 
button press at a time. PopulateBoardItems() then spawns a set number of coins and 
traps at open locations on the board using the SpawnItem(). SpawnItem() checks random 
locations on the ?oor and sca?old and spawns items based on if the space is available 
or not. CreateOuter() is responsible for creating an outer wall surrounding the board 
which prevents the player from getting outside the bounds of the board. Jumble() then 
uses the TileMovementWrapper.cs that was applied to board and moves the EmptyTile 
X amount of times to e?ectively shu?e the board and prevent the player from walking 
straight from the start to the ?nish (Fig. 4). 
Fig. 4. Top down view of an example board generated from the BoardManager. Green represents 
sewage, grey is sca?olding paths and lower level paths, red dots represent traps and coins. 
948 A. Medendorp and S. K. Semwal
6 Conclusions and Future Research 
Our implementation is an attempt at the 15-puzzle, by introducing it to a platform and 
using rogue-like mechanics, to give a fresh take on this genre. Utilizing the grid and 
snaps, helped ensure that the various items that make up a block ?t together and that 
blocks themselves ?t with each other. We have focused on technical challenges of 
generating solvable board generation and tile-movement in this paper. Future ideas for 
the project include adding working boss stages, event switches that control in-game 
items, a shop and inventory system, functional currency pickups, functional heart health 
system, dialog system, and AI enemy system. These items still need to be added towards 
a fully functional game. 
Acknowledgments. The authors are indebted to Unity3D™ Community and gratefully 
acknowledge that several existing functions from Unity3D™ toolbox were used to implement the 
?gures shown in this paper. We also used several tutorials from YouTube™ including: (a) 
Tutorials - 2D Roguelike Project, (b) Options Menu in Unity 5 Tutorial - Part 1, (c) Solving Sliding 
Tile N-Puzzles With Genetic Algorithms and A*, (d) Unity Controller Controlled GUI Tutorial, 
(e) Flying Camera Menu Tutorial [Unity3D 4.6], (f) Creating a Start Menu in Unity 5, (g) Unity 
Third Person Control Tutorials. Other online resources acknowledged are: (i) Adventures in 
Bitmasking “Angry Fish Studios, (ii) An exercise in modular textures - Sci? lab 
UDK”, polycount, 2011. Available: http://polycount.com/discussion/89682/an-exercise-in-modular-
textures-sci?-lab-udk. [Accessed: 29- Nov- 2017], (iii) File:15-Puzzle.jpg - Wikimedia 
Commons. 
References 
1. Green, D.: Procedural Content Generation for C Game Development: Get to Know Techniques 
and Approaches to Procedurally Generate Game Content in C Using Simple and Fast 
Multimedia Library. Packt Publishing, Birmingham (2016) 
2. Felicia, P.: A Quick Guide to Procedural Levels with Unity, San Bernardino (2017) 
3. Norton, T.: Learning C? by Developing Games with Unity 3D. Packt Publishing, Limited, 
Birmingham (2013) 
4. Hocking, J.: Unity in Action. Manning Publications Co., Shelter Island (2015) 
5. Murray, J.: C# Game Programming Cookbook for Unity3D. CRC Press, Boca Raton (2017) 
Procedural 3D Tile Generation for Level Design 949
Some Barriers Regarding the Sustainability 
of Digital Technology for Long-Term Teaching 
Stefan Svetsky(?) 
and Oliver Moravcik 
Faculty of Materials Science and Technology in Trnava, Slovak University of Technology, 
Trnava, Slovakia 
{stefan.svetsky,oliver.moravcik}@stuba.sk 
Abstract. Computer support of teaching is linked to terms like e-Learning, 
Technology-enhanced learning and Educational technology. Despite the very 
high level of the global IT services, networks, and actual clouds, which are also 
used in education, from a personalized teacher point of view, this is mostly only 
technological infrastructure that can be used for handling a curriculum content. 
A set of successfully used educational IT tools could be also mentioned, however, 
these are often only single-purposed static solutions. One needs not to do a 
research review – he can simply ask his colleagues – how many barriers are caused 
by incompatibility of the software formats, and short life cycle of software, hard- 
ware and networks solutions. Paradoxically, it is automatically supposed in the 
actual scienti?c papers that digital technology functions without any problems. 
Additionally, many questions arise regarding the IT sustainability for long-term 
teaching within engineering education. This paper describes some categories of 
the barriers to embedding the digital technology into teaching and shows some 
key points derived from around 12 years of the practical experience related to 
solving the personalized IT support of teachers within teaching bachelors 
students. As for the long-term teaching, it also demonstrates that state-of-the-art 
of the IT support for university teaching is not yet suitable including indicated 
lower IT skills of students. In real life, to be sustainable, a teacher should be 
teacher, programmer and researcher in one person. Regarding elimination of the 
barriers within long-term teaching, an all-in-one universal approach is presented 
by using the in-house educational software BIKE(E), based on design of a “virtual 
knowledge”. This speci?c default data structure enables solving knowledge 
(educational data) transmission through the o?-line and online environments. In 
comparison with other solutions, it seems to be the most universal and sustainable 
solution for personalized IT support. 
Keywords: Computer support of teaching · Technology enhanced learning 
Educational technology · Educational IT tools · Educational software 
Digital technology · Personalized IT support 
1 Introduction 
In principle, each strategic policy is dedicated to integration of digital technology to 
education. However, in contrast to long-term teaching needs, the life-cycle of 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 950–961, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_71
technological tools in real life is shorter. This fact is rarely discussed in scienti?c liter- 
ature, where it is automatically supposed that any technology or computer support of 
teaching works perfectly. However, from the teacher’s point of view, it works not 
perfectly. He needs a sustainable, i.e. long-term personalized support. Such personalized 
support was a key issue of some ICT calls of the EU-FP7 research program, and its 
importance is very realistically emphasized in [1]: “Ideally, personalization is relevant 
to all the stages of the learner’s journey”. Similarly, its importance for lifelong compe- 
tence development is discussed in [2], or e.g. in the ?eld of information and knowledge 
processing is discussed in [3]. In view of this, teachers still use for their teaching mostly 
global solutions (learning management systems, WEB-services, cloud computing, even 
social networks). Around ten years ago, the global learning management system Moodle 
was indicated as the most frequently used TEL - tool at European universities [4]. It is 
still used, although nowadays, universities already use their own learning management 
systems. One should be aware that such global solutions are static and not personalized 
enough (they could be considered rather as a kind of an information or knowledge 
management). It is emphasized similarly in [5] that “the most of the existing e-learning 
platforms o?er just a single way to organize the course contents (book structure)” and 
“whoever is interested in organizing the course contents in a di?erent way must use 
speci?c tools that are external to the e-learning environment”. 
Any support of uncertain and unstructured teaching processes by digital technology 
is very complex, and it is much more di?cult than a common IT support of technical 
processes, which are well structured. This results into a certain terminological chaos and 
di?erent approaches like e-Learning, Technology-enhanced learning (TEL), Educa- 
tional technology, Computer supported collaborative learning. Nowadays, e-Learning 
is mostly considered to be subject of distance learning e.g. as it was indicated by Univer- 
sity of Oxford, and “confusing” in general. It was replaced by the term “Learning and 
technology” (the terms Technology-enhanced learning and Educational technology 
were not accepted) [6]. From the real teaching practice point of view, the subject of TEL 
was very realistically described in [7]. As for the practice, the actual situation in TEL 
was discussed in the UCISA report [8]. In this report, it is argued that lack of time, 
departmental/school and institutional culture, including internal funding remain “the 
leading barrier to TEL development”. 
It is not possible to present here a complex review because the integration of digital 
technology into teaching and learning covers many ?elds of ICT and Computer science. 
However, as could be generalized from the above mentioned, developing personalized 
approaches and universal, all-in-one software is still a big challenge. To con?rm the lack 
of software solutions, one can quote from [9]: “a system design on the model basis has 
been widely ignored by the community until now, and software engineering is missing 
in TEL system development”. In light of the knowledge principles of teaching and 
learning, such missing models should be developed based on the subject of knowledge. 
In this context, semantic and ontological approaches should also be taken into account, 
e.g. as a connection of technology and education [10], or knowledge representation and 
ontologies [11]. The high level of ICT can be a basis for development of other speci?c 
approaches, e.g. for solving ontologies dedicated to visually impaired people [12], 
Some Barriers Regarding the Sustainability of Digital Technology 951
educational cloud computing [13], or combinations with machine learning or 
evolutionary algorithms [14]. 
Based on the idea that computers can be the means of automation of teaching 
processes (when knowledge is accepted as a regulated process parameter), author’s 
approach is based on modeling a knowledge representation using the so-called virtual 
knowledge as a speci?c default data structure. They already published this approach 
[e.g. in 15, 16]. To better understand such approach, it must be mentioned that the 
authors’ personalized computer support of teaching was empirical at the beginning. Step 
by step it evolved into systematic research under the umbrella of TEL (it was the o?cial 
term of the FP7-ICT calls in period of 2007–2013). Within this research a paradigm of 
batch (balk) information and knowledge processing was designed, which is performed 
by the in-house developed software BIKE(E) (Batch Information and Knowledge Editor 
and Environment). Its selected part WPad is installed on classroom computers. The 
software is written by the main author of this paper. 
By using BIKE(E), teaching material, informatics training tools, and personalized 
virtual learning environment with communication channels were produced. Together 
with personal cloud and network spaces, this created a background infrastructure for 
integrating digital technology into teaching. The personalized approach to teaching 
processes as knowledge based enabled authors to research the automation of teaching 
processes by using a model of virtual knowledge, including solving the knowledge 
transmission between computers and networks (registered at the patent o?ce). In addi- 
tion, this virtual knowledge as default data structure enabled teachers to research new 
approach to collaborative teaching and collaborative activities of teachers, students and 
researchers on shared virtual spaces or clouds. Authors’ current research is focused on 
(multi-lingual) human knowledge processing, modelling a new generation of educa- 
tional packages, and working on a vision of educational robot. It should be emphasized 
that this approach does not use any mathematical model. Technological design is simply 
based on a simulation of mental processes running within teaching, self-study and asso- 
ciated activities (e.g. writing paper or research project). In principle, the virtual knowl- 
edge only “switches” between human and machine as it is needed for (educational) 
knowledge processing and transmission, including informatics activities - to assure 
compatibility and adaptation to Windows and networks. 
The authors’ target is that all must be low-budget, effective and user friendly from 
the users’ point of view, including using minimum of interfaces and software. And 
of course, knowledge processing is in natural language. To the best of authors’ 
knowledge, such approach (knowledge representation, data transmission, personal- 
ized educational software, batch knowledge processing paradigm) does not exist yet. 
It seems to be beyond the state of the art, i.e. there are no analogical results in the 
literature on modelling the automation of teaching processes. It should be also 
emphasized that the authors’ complex approach covers many informatics areas in 
comparison with common solutions of computer support of teaching, which are 
mostly single-purpose and monothematic. In other words, many barriers of digital 
technology integration into teaching must be overcome, especially from the long-term 
teaching point of view. Some of them are discussed in the next section that 
relates to internet browsers, changes of hardware and software, servers and 
952 S. Svetsky and O. Moravcik
networks, including human factor and behavior of students. The significance of 
didactics and informatics algorithms and off-line and online data transmission for 
automation of teaching processes is discussed in the separate sections (it is impor- 
tant for collaborative learning activities). 
2 Barriers of ICT Integration into Teaching 
2.1 Internet Browsers 
Internet browsers are needed because the teaching material was designed by the teacher 
and produced by the continually developed in-house software BIKE(E)/WPad as a set 
of html-?les. They were used for browsing both in o?ine mode (classroom computers, 
notebooks) and online mode (faculty’s virtual learning space). At the beginning of 
research on TEL, identical learning content was used by students in classrooms, faculty’s 
server and teacher’s private internet domain. Within 10 years of authors’ research, only 
internet browsers and WPad were used, despite the fact that mostly PowerPoint is 
commonly used for lectures and exercises. Each student or group of one to three students 
had always the study material at their own computer screen (it was combined with using 
sheets of paper or blackboard, e.g. when writing chemical formulas). From the peda- 
gogical point of view, the OPERA browser under Windows XP was the most suitable 
browser. It enabled students to create sessions from study materials (something like a 
personal library), however, only until version 9.27, when it was re-designed to be 
compatible with the Google Chrome browser. In general, internet browsers versions are 
continually updated, therefore they have short life cycle. Moreover, each new version 
of the internet browser required an increased memory (RAM) capacity. It resulted in 
slow performance of classroom computers, consequently, teaching process was often 
interrupted. The same situation repeated itself when the “more popular” Firefox and 
Google Chrome were used later on the Windows 7 operating system. In this case, older 
versions of internet browsers had to be used because the higher versions were not 
supported. 
Such obstacles complicated the computer support of teaching when students gener- 
ated html-?les, e.g. within writing the collaborative semester work, or production of 
study material for chemistry to eliminate their lower knowledge. Other speci?c problems 
occurred, e.g. because the html-outputs generated by WPad were not always identical 
in every browser, including problems with the diacritic of natural language (some 
computers had only English settings). 
It must be mentioned as well that the described research was performed by the teacher 
at the faculty’s detached workplace which has no IT administrators. 
2.2 Changes of Hardware and Software 
One should be aware that computers and hardware have shorter lifetime than the one 
that is required by any long-term teaching and learning. The teacher had to move all his 
documents, pictures and other ?les from old computers to new computers (commonly 
tens to hundred thousand ?les). It was therefore very time-wasting, to change data for 
Some Barriers Regarding the Sustainability of Digital Technology 953
10 or 15 classroom and home computers. But it was even more time-wasting to install 
previously used software into new computers, including its activation. In addition, 
certain 32-bit programs, e.g. for speech recognition, worked only on 32-bit computers. 
Thus, the teacher had to upgrade to 64-bit versions of the software. 
In comparison with this, the transfer of study content created by the educational 
software BIKE(E)/WPad was not problematic because this database application func- 
tions well under any Windows operating system. However, because of newer Windows 
versions and software, certain speci?c items of the user menu had to be modi?ed in order 
to adapt the existing programming codes to the newer operating system. It a?ected 
mainly the menu items design related to ?le management in o?ine mode. It must be 
also noted that these obstacles are rarely mentioned in literature. In all related scienti?c 
papers, it is mostly automatically supposed that technology works without any problems. 
In the case of IT support of teaching, computers worked reliably commonly for three to 
?ve years. After that period they were slower and slower. 
For example, it took the teacher’s work computer - with Windows XP (2005) on it 
- ten minutes to launch and it was regularly frozen during the antivirus background scan. 
Nevertheless, this was eliminated via remote control, i.e. by switching from computer 
to the teacher’s virtual space on the faculty’s cloud. After the switching, the teacher 
worked without problems using Windows 7 and WPad (client-server). 
In real life, a teacher must simultaneously use computers with di?erent operating 
systems, usually combining his work computers with home client computers and note- 
books. In our case: 
• Home computer’s lifecycle with Windows XP on it was about 15 years before its 
break-down (currently, Windows 10 notebook). 
• Work computer with Windows XP (2005) on it was discarded (2017) when the 
teacher changed work position. 
To summarize, the lifecycle of hardware and software was less than ?ve years. The 
educational software BIKE(E)/WPad was working without any problems for 12–15 
years. This meant that it was not problematic to copy the educational content, only 
certain menu items, related to the compatibility with Windows functions or general 
software, were rewritten. 
2.3 Servers and Networks 
Nowadays, digital technology is more or less embedded in every kind of teaching, so to 
use terms like “traditional face-to-face teaching”, blended learning, WEB-based 
learning, instructional design, as well as, e-Learning, technology-enhanced learning, 
educational technology, seems to be problematic. Namely, if a teacher wants to solve 
the IT support of his teaching directly or indirectly, he must combine several kinds of 
methods and technological tools (software, hardware). 
Despite the fact that authors published their research continually under the umbrella 
of technology-enhanced learning, they consider their research basically as face-to-face 
teaching, which is supported by any o?-line and online digital technology tools to auto- 
mate teaching processes. During teaching hour, an IT tool must be used which enables 
954 S. Svetsky and O. Moravcik
the teacher to deliver teaching content in the quickest way and within limited time. 
Namely, one must be aware that teaching hour has only 45 min and the teacher must 
cover multiple topics within this short period. 
In this context, after implementing the technological off-line and online infrastructure 
with teaching content (educational knowledge repository), it was needed to solve the feed- 
back and knowledge flow transmission. Therefore, in this stage, the teacher wrote PHP/ 
MySQL internet applications which functioned as communication channels between the 
classroom and faculty’s server learning space. For example, when teaching C++ program- 
ming, this was the quickest way to transmit programming codes of students sitting in class- 
rooms, i.e. downloading and uploading files generated by WPad in combination with the 
Bloodshed Dev-C++ software (Note: it is a full-featured Integrated Development Envi- 
ronment (IDE) for the C/C++ programming language - http://bloodshed.net/ - http:// 
orwelldevcpp.blogspot.sk/2015/). In this case, students wrote C++ codes to WPad-IDE, 
and using a specific user menu item, the codes were sent and directly opened in Dev-C+ 
+-IDE. Finally, students launched it simply by pressing F9. In this case, WPad functioned 
as a repository of solved codes. Then new bachelors student could use the base of program- 
ming codes written by students in previous semesters. 
Such support of the programming languages teaching seems to be the ideal peda- 
gogical solution. Namely, if a group of students had a task to write a programming code 
and only one student in the classroom wrote the correct code, then the teacher only asked 
the student to upload his code to the virtual learning space and all students could see it 
on their computer screen (and eventually download it and practice on their computers). 
Now, one is able to understand, that several years ago, when the faculty’s Windows 
server switched to UNIX where texts were case sensitive, the amount of work the teacher 
had with rewriting hundreds of learning (browsing) paths within existing PHP-codes. 
This illustrates that a teacher must overcome also these speci?c barriers, especially a 
teacher who teaches, designs and researches his teaching using digital technology for a 
longer time. Despite this fact, it was not the most crucial moment. Situation got much 
more complicated when after ten years of existence of the faculty’s support learning 
portal, the life of the internal server technically ended (2017). In other words, the study 
content and support tools on the faculty’s virtual learning space are not usable for 
teaching anymore. On the other hand, a new server is being just launched, so a part of 
content will be transferred to it. This example illustrates the fact that the life cycle of 
hardware, networks and servers is shorter than the lifetime which is required for the long 
life-learning of individuals. 
2.4 Students and Human Factor 
Authors’ approach to computer support of bachelors teaching was continually developed 
according to the reactions of students to pedagogical/didactic and informatics issues. At 
the beginning, the in-house software BIKE(E)/WPad was programmed to produce 
teaching and study material for some courses of study (Background of environmental 
protection, Occupation health and safety, Chemistry, Industrial Management). The 
generated html ?les were uploaded via teacher’s FTP access to the faculty’s server. 
Some Barriers Regarding the Sustainability of Digital Technology 955
Despite the fact that a good pedagogical quality material, which was suitable for self-study, 
was prepared, students were not enough motivated to use it. In order to improve 
the feedback, a speci?c internet PHP/MySQL application was written that functioned 
as communication channel. So each course of study was supported by this communi- 
cation channel to simulate popular internet chats. 
However, there were two challenges. Firstly, students surprisingly were not able to 
guess what to write into the communication channels, although the channels were also 
dedicated to information exchange between students. Thus, the teacher had to tell them 
to, e.g. write there the titles of their semester works. After several years of using the 
communication channels, one can say, that they were used by students without teacher 
instructions in less than 5% cases (it was mostly communication between distance 
students). This means that the channels were rather used for controlled communication, 
instructions of teacher, embedding of study material, even for testing suitability for 
semester exams. 
Secondly, if one writes WEB-applications using PHP-forms, there is still the problem 
of text formatting. So if the student or teacher is to write any text into a text-box form 
they must be aware that the form is not a text or html editor. In our case, this was solved 
by pre-formatted text templates as illustrated on Fig. 1. To be honest, one can say that 
students did not fully understand it. On the teacher’s side, when he prepared study or 
instructional materials for the communication channels by using BIKE(E)/WPad, this 
challenge was solved by writing a code for user menu, which converted natural language 
text to html-text. This text was then copied to text-boxes of the communication channels. 
Fig. 1. Example of the text area of communication channels. 
After long-term development of the personalized computer support teaching for 
several courses of study, one should be aware that there are hundreds of links and paths 
at disposal in the virtual learning environment, open WEB-domain or on o?-line class- 
room computers. To support browsing teaching and study materials, a navigation html 
?le was prepared for each course of study. Such navigation ?le contained thematic 
blocks with set of links (e.g., Study material, Communication channels, Calculation area, 
Tests, Results). 
956 S. Svetsky and O. Moravcik
Surprisingly, there was the same problem at the beginning of every teaching hour. 
Namely, students were not capable to open the links and start the work on time. It took 
sometimes 10 min o? the teaching hour. Although the links were put into Favorites of 
each browsers, the problems continued. It also occurred after the decision to use only 
Internet Explorer as default browser (e.g., some students installed Firefox or Google 
Chrome on their classroom computers again), or after sending them an e-mail with the 
links. There was only one possibility to technically solve this problem. However, it was 
time consuming for the teacher who personally had to set the links on each classroom 
computer. One could wonder where the so called “digital natives” were? 
Here must be noted that this situation is connected to the general problem when using 
computers, i.e. how to switch within and between o?-line and online interfaces (see, 
how we all click, click and click on links). From a research point of view, it logically 
led to an idea that the navigation paths must be dynamically joined with knowledge 
?ow. It enabled to work out additional BIKE(E)/WPad codes and test the codes for this 
problem. In other words, the virtual unit should contain both content and active elements 
(o?-line paths, online-links). This was successfully tested within teaching the program- 
ming language course which has commonly 4–7 students. This resulted in new authors’ 
visions for the future research that in principle the pedagogical/didactic, informatics and 
application algorithms should be solved and designed synchronically (compare it with 
the next text as regards the categories of algorithms). 
3 Categories of Algorithms 
A typical problem when solving the computer support of teaching is the mechanical use 
of existing general software. It is suitable mostly for content storing and processing, not 
for feedback and dynamic teaching activities which are connected to continual knowl- 
edge ?ow. However, when anyone writes programs, e.g. for accounting, he must know 
accounting rules and algorithms (a sequence of accounting steps). Thus, if anyone should 
write programs for automation of teaching processes, as well, these must be primarily 
de?ned and described by the teaching algorithms (i.e. as sequences of pedagogical/ 
didactic steps). 
According to the authors ?ndings, from a programming of view, with one item of 
the teaching content tens of activities (events) can be potentially connected. Such items 
can be represented by information, content knowledge, schema, which should be used 
(processed) within a teaching hour in various ways and up to the teacher needs. Similarly, 
other activities can be connected with the items like reading texts, constructing teaching 
material, internet retrieving, writing a paper, assessment, translating, repeating, drilling 
etc. If one had a set of de?ned pedagogical/didactic algorithms and was able to write 
programming codes or applications in general, he would ?nd that this is additionally 
connected with many activities on his computer, e.g. ?le management, conversions to 
various formats (texts, visualization, audio/video). For example, when writing a paper, 
he must continually switch between many o?-line and online windows, folders and use 
many paths and links. In other words, he must be aware which repeated algorithms he 
uses when working on his client computer, notebook or network. 
Some Barriers Regarding the Sustainability of Digital Technology 957
During their research, authors indicated that the above mentioned approach “how to 
computerize and automate teaching and associated educational activities” requires one 
to solve three categories of algorithms: 
• Pedagogical/didactic algorithms needed for teaching (within lectures, exercises, self-study). 
• Algorithms for solving the adaptation and compatibility related to Windows, servers 
and networks. 
• Application algorithms for supported teaching activities (to produce outcomes by 
processing knowledge ?ow). 
A very important ?nding from modeling the Computer supported collaborative 
learning was that the identi?cation of pedagogical/didactic algorithms is crucial for this 
kind of IT support. It is also the key issue when solving shared collaborative activities 
of group of teachers and researchers. From a layman point of view, if a sequence of 
teaching steps is not known and de?ned it is not possible to write any programming 
codes for collaborative and shared activities. 
4 O?-Line and Online Data Transmission 
Teaching process is about knowledge transmission between teacher and students. There- 
fore, the issue of data transmission is very important. In view of this, one should be 
aware that a teacher or students must use and manage hundreds of online and o?-line 
sources to ?nd and select a suitable information and knowledge. Technology enables 
them to perform information or knowledge management basically as the so called ?le 
management. Nowadays, in real life, a teacher uses commonly multiple home or work 
computers, i.e. client computers and notebooks, including virtual spaces of clouds and 
networks. Figure 2 illustrates such personal infrastructure which was developed within 
the long-term research on TEL to overcome the technological barriers mentioned above. 
The issue of educational data transmission (content transfer) is less described in 
scienti?c literature than content processing, despite the fact that it is very time 
consuming. It does not matter whether a teacher teaches ten to hundred bachelor students 
or he solves automation of personal processes. To overcome this time barrier, it is 
important to identify which activities he must often repeat, in order to write appropriate 
programming codes. As regards the computer-computer ?le transmission when using 
the ICT infrastructure in Fig. 2, one has various alternatives, i.e. using USB, ?le manager, 
Windows Explorer or WIFI to copy/move or download/upload the ?les. However, if he 
often needs to transfer ?les between folder C:\AA and D:\BB he can automate it e.g. by 
the command written into the bat ?le: 
XCOPY C:\AA\?le.* D:\BB\?le.*. 
In our case, the teacher must transmit ?les between home computers, IBM BOX cloud 
and faculty’s cloud. The quickest way is by using IBM BOX cloud, which enables 
synchronization, so, it is possible to write the following bat-?le command: 
XCOPY C:\AA\?le.* C:\USERS\…\BOX SYNC\?le.* 
958 S. Svetsky and O. Moravcik
(i.e. to copy the ?le into BOX Sync-folder on teacher’s computer). In the case as in 
Fig. 2 that WPad is installed both on the faculty’s cloud and client computers/notebook 
(actually Windows 10) the bat-?le command could be: 
XCOPY C:\AA\?le.* \\TSCLIENT\D\BB\?le.* 
As regards the transmission of BIKE(E)/WPad-?les (database tables), it is possible to 
save them directly to the BOX Sync-folder. As was presented within the FTC 2017 
conference, the data transmission is performed according to the Utility model regis- 
tered on the Patent o?ce. 
Fig. 2. Example of the personal teacher o?-line/online ICT infrastructure. 
Some Barriers Regarding the Sustainability of Digital Technology 959
5 Conclusions 
In contrast with actual scienti?c literature, which automatically supposes that digital 
technology always works in the education setting without problems, this contribution 
discussed some challenges regarding the IT sustainability (suitability) for long-term 
Engineering teaching. Based on around twelve years of experiences with developing a 
personalized computer support of teaching, it was demonstrated on examples (from the 
teaching practice and academic research) that life-cycle of browsers, software, hardware, 
servers and networks is shorter than a real personal computer support of teachers and 
students requires. Some categories of barriers were mentioned. Basically, the life-cycle 
of technological tools is too short (3–5 years), while the teacher needs the technology 
for much longer periods. The authors’ research has shown that the best solution to this 
problem is to develop both their own educational software and personalized ICT infra- 
structure for educational data transmission (see Fig. 2). In other words, the sustainability 
of the computer support of long-term teaching was achieved by using the all-in-one 
software application BIKE(E)/WPad (it has been in use for more than ten years). The 
application was continually developed as an all-in-one software while teaching around 
two thousand bachelor students under umbrella of the TEL. In principle, it is based on 
design of a virtual knowledge data structure that enabled a single user to handle educa- 
tional content in many ways up to his personal needs. This provides teachers, students 
or researchers with a user friendly transmission of personalized data via the virtual 
knowledge structure through o?-line and online environments. In this context, the soft- 
ware application BIKE(E)/WPad is additionally suitable for modelling the collaborative 
activities of students or researchers running on faculty’s servers or global clouds. The 
most e?ective and low-budget solution for teaching and collaborative activities was the 
case when WPad was installed both on cloud or virtual computer with Windows 7 and 
personal computers of teacher and researchers. 
Technically, the research limitations are that this software is Windows dependent 
and that all content must be converted to plain text format. From the pedagogical point 
of view, the teachers must be willing to use the software and be able to think of ways 
how to insert knowledge content into the virtual knowledge mentioned above (e.g. 
content of the course of study). The future research on collaborative activities requires 
solving a design of their pedagogical-didactic algorithms as a background for writing 
the programming codes and applications. Authors also presented a new vision of their 
future research proposing that the pedagogical-didactic, informatics and application 
algorithms should be designed synchronically. 
References 
1. Laurillard, D.: Digital Technologies and Their Role in Achieving Our Ambitions for 
Education. https://www.researchgate.net/publication/320194879_Digital_technologies_ 
and_their_role_in_achieving_our_ambitions_for_education. Accessed 16 Feb 2018 
2. Kostadinov, Z.: Sharing personal knowledge over the Semantic Web. In: Proceedings of the 
International Workshop: Networks for Lifelong Competence Development, So?a (2006). 
http://dspace.ou.nl/bitstream/1820/720/1/Paper09.pdf 
960 S. Svetsky and O. Moravcik
3. Bieliková, M., et al.: Personalized Conveying of information and knowledge. Studies in 
informatics and information technology. In: Research Project Workshop Smolenice, pp. 53– 
86. SUT Press, Bratislava (2012) 
4. Matusu, R., Vojtesek, J., Dulik, T.: Technology-enhanced learning tools in European higher 
education. In: Proceedings of the 8th WSEAS International Conference on Distance Learning 
and Web Engineering, Santander, Cantabria, Spain (2008) 
5. Alfano, M., Cuscino, N., Lenzitti, B.: Structuring didactic materials on the WEB (STRUCT). 
Commun. Cognit. 41(1 & 2), 53–66 (2008) 
6. Programme Speci?cation for M.Sc. Education (Learning and Technology). Department of 
Education, University of Oxford. http://www.education.ox.ac.uk 
7. Goodman, P.S., et al.: Laurence Erlbaum Associates: Technology Enhanced Learning: 
Opportunities for Change (2002) 
8. Walker, R., Voce, J., Swift, E., Ahmed, J., Jenkins, M., Vincent, P.: 2016 Survey of 
Technology Enhanced Learning for higher education in the UK. UCISA TEL Survey report 
(2016) 
9. Martens, A.: Software engineering and modeling in TEL. In: Huang, R., Kinshuk, Chen, N.- 
S. (eds.) The New Development of Technology Enhanced Learning Concept, Research and 
Best Practices, pp. 27–40. Springer, Heidelberg (2014) 
10. Tolgyessy, M., Hubinský, P.: The kinect sensor in robotics education. In: Proceedings of RiE 
2011, 2nd International Conference on Robotics in Education, Vienna, Austria, pp. 143–146 
(2011) 
11. Haidegger, T.: Developing and maintaining sub-domain ontologies. In: Proceedings of 
Standardized Knowledge Representation and Ontologies for Robotics and Automation. 
Workshop at IEEE/RSJ IROS, Chicago, IL (2014) 
12. Mikulowski, D., Pilski, M.: Ontological support for teaching the blind students spatial 
orientation using virtual sound reality. In: Advances in Intelligent Systems and Computing 
book series (AISC) Interactive Mobile Communication Technologies and Learning, 
Proceedings of the 11th IMCL Conference, vol. 725, pp. 309–316 Springer (2018) 
13. Shyshkina, M.: The general model of the cloud-based learning and research environment of 
educational personnel training. In: Auer, M., Guralnick, D., Simonics, I. (eds.) Teaching and 
Learning in a Digital World. ICL 2017. Advances in Intelligent Systems and Computing, vol. 
715. Springer, Cham 
14. Volná, E., Kotyrba, M.: A comparative study to evolutionary algorithms. In: Proceedings 
28th European Conference on Modelling and Simulation, ECMS 2014, Brescia, Italy, pp. 
340–345 (2014) 
15. Svetsky, S., Moravcik, O.: The implementation of digital technology for automation of 
teaching processes. In: Proceedings of the Future Technologies Conference, San Francisco, 
USA, pp. 340–348. IEEE (2016) 
16. Svetsky, S., Moravcik, O.: The empirical research on human knowledge processing in natural 
language within engineering education. In: WEEF & GEDC 2016: The World Engineering 
Education Forum & The Global Engineering Deans Council, Seoul, Korea, pp. 10–12 (2016) 
Some Barriers Regarding the Sustainability of Digital Technology 961
Digital Collaboration with a Whiteboard 
in Virtual Reality 
Markus Petrykowski(B) , Philipp Berger, Patrick Hennig, 
and Christoph Meinel 
Hasso-Plattner Institute, Potsdam, Germany 
markus.petrykowski@student.hpi.de 
https://hpi.de/forschung/fachgebiete/internet-technologien-und-systeme.
html 
Abstract. Nowadays virtual reality has become suitable for the end 
consumers and o?ers a whole new way of digital interaction. This tech-nique 
allows users to perceive and engage with the environment as they 
would interact with it in the real world. We introduce a virtual reality 
whiteboard that allows users to work together and implement Design 
Thinking methods as in a real work space. We conceptualize the inter-action 
methods and implement a prototype to tackle typical collabo-ration 
tasks including brainstorming, prioritizing or the explanation of 
concepts. The conducted user study shows that participants performed 
better for speci?c Design Thinking tasks within the virtual reality in 
comparison to today’s digital whiteboards, although these are supported 
by personal interactions like video conferencing. 
Keywords: Collaboration 
·
Design-Thinking 
·
Virtual reality 
WebVR 
1 Introduction 
People are using computers for a long time now. With a keyboard and a mouse 
they are able to interact with them. After touch displays have become a?ordable 
and popular, people also started to use them as well. A lot of applications, and 
probably the most common ones like text editing, make a lot of sense to be used 
with a keyboard and a mouse because it does not extend to a third dimension. 
At this point it makes sense to introduce virtual reality. It provides a more 
natural and intuitive way of interaction. The user is able to grab objects and 
change their position by actually grabbing them with his hands. Virtual reality 
was ?rst introduced by Sutherland [1] in 1965 where he described his idea of an 
ultimate display. From this moment on applications for VR evolved slowly. 
Apart from a more natural way of interaction, virtual reality also allows to 
increase the perception of social presence. People may physically be at di?erent 
places but can be at the same place within a virtual environment. This possibility 
opens up a whole new way of bringing people together. Design Thinking is an 
n 
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 962–981, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_72
Digital Collaboration in Virtual Reality 963 
innovation method that succeeds by aggregating the knowledge and experience 
of multiple people with di?erent backgrounds. Therefore methods from Design 
Thinking such as brainstorming are particularly interesting to be implemented 
within a virtual reality application. 
Having a look at di?erent collaboration tools shows that their complexity 
increases. Looking at Google Docs1 which enables people to write texts or cre-ate 
slides together, going further to applications like neXboard that provides a 
digital whiteboard to make Design Thinking over distances possible. This shows 
the various possibilities for collaboration. People can either work together while 
being at the same place or, enabled by modern technologies, they can collabo-rate 
while being at di?erent places. The enhancements that can be created by a 
virtual reality prototype will be worked out in this paper. 
This paper proposes an application that enables Design thinking within the 
virtual reality. This raises two challenges in particular. The ?rst is to create an 
environment in which users can experience each other in a similar way as they 
could if they would meet in person. The second tackles the mapping of di?erent 
2D interactions to matching or alternative user interactions within the virtual 
reality space. For example how a user selects a tool or how he even uses it. 
Using our prototype (see Fig. 1), we propose solution to these challenges and 
evaluate them by executing a user study to compare digital whiteboards and 
VR solutions. 
Fig. 1. The VR prototype for support-ing 
remote collaboration tasks. 
Fig. 2. neXboard with video-functionality 
2 Related Work 
Since the ?rst idea of virtual reality with head mounted displays proposed by 
Sutherland in 1968 [2] there have already been di?erent approaches for creating 
virtual environments. 
We ?rst introduce collaboration systems without virtual reality, explain an 
augmented reality approach which is a compound of 2D interfaces and a 3D 
environment and show successful implementations of collaboration within virtual 
reality. 
1 
An online service provided by Google that enables people to work on documents 
collaboratively https://www.google.de/intl/en/docs/about/ (February 2018).
964 M. Petrykowski et al. 
The neXboard is the successor of the Tele-Board. It has been developed by 
Wenzel and Gumienny et al. [3,4] as a browser based application. It is a digital 
whiteboard that enables people to work together in real-time with sticky-notes 
and scribbles. It also focuses on supporting users to perform Design Thinking 
across distances [5]. Design Thinking is an innovation method that has been 
further developed by Meinel, Plattner and Weinberg [6]. 
The collaboration can be supported by a video-conference as shown in Fig. 2. 
NeXboard is an example for a successful approach to support digital collabora-tion 
among distances. The videoconferencing approach has been shown to have 
a positive impact onto the result as described in more detail by Fussel et al. and 
Wenzel et al. [7,8]. 
Szalav´ ari et al. [9] propose a system called Studierstube that enables users 
within one room to collaborate with each other, while using an Augmented Real-ity 
head mounted display. According to them, AR should give the possibility “[..] 
to provide insight into a complicated problem by the enrichment of simulation 
data.” They put an emphasis onto bringing the people together in one room. 
This would allow them to communicate the way they are used to - by speaking 
and gesturing. This approach shows that collaboration, even if it is supported 
by digital technologies, thrives from the presence of the coworkers and the pos-sibility 
to see and talk to each other. 
DIVE, which was introduced by Carlsson and Hagsand [10], is a system 
designed to engage users to see, meet and collaborate with other users and 
applications in a virtual environment. All participating users are represented by 
“graphical objects called body-icons”. 
Fig. 3. DIVE VR System - View of one 
user seeing a board and other users [2]. 
Fig. 4. Interaction tools and di?erent user 
created objects [11]. 
Figure 3 shows the DIVE application, the authors do not use real world 
metaphors for the interaction between the user and the virtual environment. 
Actions such as grasping and selecting are performed with a 6-DOF mouse by 
pointing at objects and clicking the mouse button. Despite that DIVE used old 
technologies for a virtual reality implementation they still showed that users 
within a virtual environment pro?t from a digital representation of each other.
Digital Collaboration in Virtual Reality 965 
To further dive into related work we need to discuss interaction techniques 
for a collaborative VR environment like CocoVerse. CocoVerse is a “Multi-User 
Framework for Collaboration and Co-Creation in Virtual Reality” by Greenwald, 
Corning and Maes from the MIT Media Lab [11]. This system shown in Fig. 4 
is the most recent one and also makes use of modern virtual reality hardware 
(HTC Vive). According to the interaction they stick to the principle of locally 
available interaction. The user is equipped with a toolbelt that he can access 
regardless of his position. Regarding the interaction with virtual objects their 
system allowed the users to either interact by intersecting with the objects or 
by reaching them from a distance. This approach shows that by using modern 
virtual reality hardware the user can be equipped with naturally feeling ges-tures 
that create an immerse experience. The authors emphasis the importance 
of locally available interaction e.g. to change between available tools. To sum 
up, the described approaches have di?erent emphases but still are collabora-tion 
tools. According to the collaboration they point out, that the feeling of 
social presence and the possibility to interact with each other plays an impor-tant 
role. The feeling of such a social presence can be supported by providing 
digital avatars that represent each user. Further more the immersiveness of a VR 
system plays an important role for conveying realistic experiences. Giraldi et al. 
[12] mention a few basic principles that should be covered such as a view based 
on the heads position and a depth perception through stereoscopic viewing to 
create a immersive experiences. The introduced systems also show an emergence 
of interaction techniques. Whereas in the early days of VR systems like DIVE or 
Studierstube used pointer based input such as rays, the more recent applications 
like CocoVerse made use of modern hardware to provide more natural metaphors 
for reacting to the virtual environment. What sticks out for VR systems is that 
the possibility to move around within the virtual environment raises the need of 
locally available interaction, which is an important principle. 
3 Collaboration in Virtual Environments 
The previous section showed similar work done by other researchers and 
extracted the essential points. This thesis evaluates how far virtual reality is 
able to support collaboration with whiteboard related applications. In the fol-lowing 
section the main aspects for the prototype are being discussed according 
to interaction paradigmas and the creation of social presence to provide an easy 
to use collaboration tool. 
3.1 Interaction Paradigmas 
The neXboard, which will be the whiteboarding tool that the application is 
based on, allows di?erent kind of interactions, such as creating cluster, drawing 
scribbles, creating sticky-notes or placing voting dots on di?erent sticky-notes. 
To keep it simple enough on the one hand, but complex enough to e?ciently 
work on a task on the other hand the tools for the following discussion are being
966 M. Petrykowski et al. 
reduced to the three essential ones. Those are a pen to draw on the virtual board, 
a tool that creates sticky notes and the general purpose hand for gestures and 
expressions. Those are being called the pen tool, the sticky tool and the hand 
tool in the following. 
3.2 Tool Selection 
As the tools have now be de?ned it is crucial to ?gure out how the user can 
choose which one he wants to use. 
Using Buttons. The VR-joysticks usually provide the user with physical but-tons 
he can click. To most people until the age of around 35 those buttons are 
familiar due to the popularity of game consoles from the 1970’s such as an XBox 
or a Playstation. Therefore this feels usable for them. However, this approach 
would not seamlessly integrate with the virtual environment. 
Using a Digital Tool-Belt. Another technique could be build similar to the 
one that was used by Greenwald, Corning and Maes [11]. They proposed a tool-belt 
that opens up once the user reaches down to his waist level. The open 
tool-belt allows the user to select between the tools. This has the advantage 
that tools are always in reach and not placed at a certain position. This sticks 
to the principle of locally available interaction described in Sect. 2. Due to the 
possibilities to freely move around in the virtual environment users would not 
have to walk a lot in order to change their tool. 
Changing Tools Through Gestures. A third method could enable the user 
to use a certain set of gestures which would magically change the current tool. 
This would also apply to the principle of locality as the user can perform those 
gestures wherever he is. But on the other hand the gestures could be harder to 
memorize and would de?nitively lack in explorability. It describes the ability for 
a user to ?nd out this feature on his own. Another point where this approach 
might be di?cult could be the number of tools. The more tools available, the 
more gestures a user has to memorize. 
Apart from all the complications this approach could have, it could also 
be applied in a simple manner. The application could for example provide one 
simple gestures to change through the available tools. This method would allow 
an easy and fast switch between the tools if there are only a few. 
Providing Dedicated Hardware for Di?erent Interactions. A forth app-roach 
is to actually have dedicated hardware for the user to choose from that can 
be used for certain interactions. The oculus touch controller could therefore act 
as the general purpose device. But one could also imagine to use some kind of 
virtual reality pen that behaves like a pen in the real world. This would empower
Digital Collaboration in Virtual Reality 967 
the user to get a good feeling of the tool he currently uses. A challenging problem 
would be to help the user to ?nd the devices. 
There are a few problems that come with that idea. The ?rst one is the 
realistic tracking of the additional controller. If one has multiple devices, he 
cannot hold them all at once in his hand. Therefore he would want to place 
them somewhere. So a device could either be placed on the ground or on certain 
objects. Since the user has to be aware of those objects in his environment in 
order to not collide with them during his VR experience they also have to be 
tracked. Of course the additional hardware could also be placed outside the 
tracked area. This would lead to the di?culty of getting them back without 
leaving the immerse space. It could either be solved by an additional person 
that reaches in the tool or by grabbing and searching for them which would be 
the same as tapping in the dark. 
All in all this approach would be very challenging on the one hand, but 
it would open up the opportunity to provide a realistic and maybe even more 
immerse experience for the user. 
Comparison. After the closer look to the possibilities of how to select a cer-tain 
tool two methods sound most promising. For the immerse experience within 
the virtual environment it is most important to understand and access everything 
within this world. This requirement sorts out the ?rst method, as the buttons are 
not visible to the user and do not completely ?t into the natural metaphor, as 
well as the last introduced technique of dedicated controller in Sect. 3.2 since it 
creates more problems than it actually solves in order to evaluate VR for digital 
collaboration. 
Therefore the gesture technique from Sect. 3.2 could be considered for the 
tool selection because it provides a simple interaction as well as the tool-belt 
approach which provides a convenient and already approved method of accessing 
di?erent kind of tools. 
3.3 Interaction Methods 
Since the user should now be able to choose the tool that he wants, the next 
important aspect is to decide how the actual interaction with the environment 
takes place. In the following three di?erent approaches are being introduced. In 
“A Survey of 3D Interaction Techniques” [13] Chris Hand distinguishes between 
either 2D or 3D based interaction techniques. The following two approaches 
cover both of them. 
Interaction Through Pointing. Users can for example use a ray-cast to inter-act 
with the virtual environment. Their interaction is accomplished through a 
point or as Eric Bier proposes a so called Skitter [14] a wireframe representation 
of a 3D graph that allows a better 3D orientation. Taking the three di?erent 
tools into consideration this method would work for all three of them. This is 
possible due to the two dimensional nature of the whiteboard. The neXboard
968 M. Petrykowski et al. 
is intended to be used with a mouse pointer on a desktop computer or with 
touch input on touch sensitive devices. A ray caster provides a pretty similar 
experience as a 2D mouse. The probably biggest advantage of this method is 
the ability to interact with the environment from a distance, at least to a cer-tain 
extend, as the user still needs to see what he is doing. On the other hand 
it could break with the real life metaphor. CocoVerse described in Sect. 2 has 
also evaluated a ray-casting interaction. They found out that users found this 
possibility convenient. 
Interaction Through Intersecting. This approach enforces the user to be 
right next to the object that he wants to interact with, as he needs to virtually 
intersect his hands and the object. In that case it is only the virtual whiteboard 
he can engage with. The described interaction could be depicted by the real 
world scenario of standing in front of a big touch screen. The screen does not 
allow any kind of three dimensional interaction. Instead the user is only able to 
execute tasks by touching and dragging. Nevertheless the user experiences the 
perception of direct manipulation of the board. Compared to using a mouse, the 
touch in the virtual environment, which corresponds to the intersection between 
the hand and the virtual whiteboard, equals to pressing the mouse. Moving the 
touching hand represents a mouse move with a clicked button. And removing 
the hand from the board corresponds to the same behaviour as releasing the 
mouse button. 
Although the actual interaction can be mapped onto the usage of a mouse it 
nevertheless provides a more immerse experience, which is also the advantage of 
this approach. 
3.4 Creating Social Presence of Others 
The most cited advantage of virtual reality is the possibility to create the sense 
of other peoples presence although they physically are somewhere else. This is 
re?ected in the di?erent applications like for example Facebook Spaces2 , which 
is a communication tool for virtual reality, that evolved within the last year. 
They cover areas like communication or even social gaming. 
Social presence is still important for people in order to perform certain tasks 
well. That is why companies spent a huge amount of money to bring their 
employees together. The most immerse way before virtual reality has been video 
conferencing which still plays an important role. But virtual reality is able to 
deliver a more realistic and immerse experience. 
The most common way of creating social presence is by using so called ghosts 
or digital avatars. Those avatars represent each user. Due to the accurate tracking 
of hands and head provided by the Oculus Rift, it is possible to deduce the 
posture of each user. This allows realistic representations of each participant 
and leads to the illusion of actually being next to each other. 
2 
See more on https://www.oculus.com/experiences/rift/1036793313023466/ (Febru-ary 
2018).
Digital Collaboration in Virtual Reality 969 
3.5 How Is Collaboration Made Possible 
To successfully enable the users to collaborate with each other in virtual reality 
certain conditions have to be created. First they need to be able to perform direct 
interaction. This means that actions they make should instantly be delivered 
to every other participant in the room. This action contains changes to the 
whiteboard as well as performed gestures with their body, like pointing, and the 
spoken language. Second the user need to be familiar with the interactions and 
the environment. The last point is the attendees of the virtual environment need 
to perceive each others presence. 
4 Prototype 
This section describes the built prototype according to its technologies in the 
beginning and explains what it is capable of in the following. 
4.1 Technologies 
The prototype was built with modern web technologies to show that virtual 
reality can also be leveraged by today’s web browsers. A particular advantage 
that comes with this approach is the massive compatibility with di?erent devices 
and heterogeneous systems where a web browser can run on. That is especially 
important for collaboration tools, as they should provide the possibility for as 
much user as possible to participate. 
Therefore the virtual reality web application uses a library called A-Frame3 
which abstracts from WebGL (Web Graphics Library) to render the virtual envi-ronment. 
A-Frame’s purpose is to easily create three dimensional environments 
by only using the browser’s DOM (Document Object Model). It for example 
takes care of rendering the stereoscopic images for each eye. 
The system was built as an additional extension for the already existing 
neXboard landscape4 . The prototype therefore uses the existing communication 
backend for exchanging the whiteboard information. The real-time communica-tion 
is handled by using websockets. Apart from the existing backend another 
websocket had to be created to enable the clients to synchronize changes of 
the virtual world in real-time, for example if a user performs gestures or moves 
within the three dimensional world. 
Figure 5 shows the prototype’s connection to the neXboard. Since the 
neXboard client as well as the VR client both use the same websocket server 
to synchronize the whiteboard data, users of both clients are able to work on 
the same board together. 
3 
https://aframe.io/ February 2018. 
4 
See Sect. 2.
970 M. Petrykowski et al. 
Fig. 5. Architectural diagram of prototype. 
4.2 VR/Non-VR Collaboration 
Since this prototype is part of the already existing neXboard landscape, it is 
possible to collaborate with each other in real-time in virtual reality as well as 
without VR. 
Fig. 6. Two people collaborating with 
each other in real time. 
Fig. 7. The open tool menu. 
Figure 6 shows the collaboration of two persons that both use a virtual reality 
device. That is the most interesting aspect of the proposed approach. Each user 
is able to see where the other person is, as well as how he moves his head and 
hands. This feature should enable a social presence for the participants. 
4.3 Supported Features 
The purpose of this prototype is to enable collaboration with a whiteboard 
application. Based on the neXboard it supports a small subset of functionalities 
that make out the main interaction during a collaboration. 
Figure 7 shows the menu that allows the user to choose his current tool. He 
is able to select either the drawing tool, the interaction tool or the sticky-note 
tool. Similar to CocoVerse [11] the menu opens by the use of a gesture. The user 
has to turn his wrist by 
90? 
and back as one ?uent gesture. This triggers the 
menu to open.
Digital Collaboration in Virtual Reality 971 
Fig. 8. Participant using the drawing 
tool on the whiteboard. 
Fig. 9. Participant using the sticky-note 
tool on the whiteboard. 
The drawing tool as shown in Fig. 8 allows the user to draw on the white-board, 
as he could on an actual whiteboard. Drawing is only possible by directly 
intersecting or touching the whiteboard with the digital pen. 
Shown in Fig. 9 is the user’s hand dragging a sticky-note by touching the 
whiteboard. This tool allows the user to either interact by touching the white-board 
or by using a laser-pointer interaction as described in Sect. 3.3. It allows 
the user to either create a sticky-note by interact on a free whiteboard spot or 
to interact with existing sticky-notes by selecting them. 
4.4 Mocked Functionalities 
Certain features that are essential for a collaboration but could be mocked easily 
were left out in the prototype. This covers two features. The ?rst is the call 
feature that allows participants to talk to each other. And the second is the 
possibility to add text to the sticky-notes. 
Mocking the call feature is done by placing collaborating users within the 
same room during the study. Making sure that their virtual placement roughly 
matches their real placement allows them to talk to each other and still get a 
feeling that the sound comes from the other virtual person. 
The possibility to add text to sticky-notes is especially important during the 
brainstorming process. To still enable the participants to do so and not to have 
any trouble using a virtual keyboard, text-to-speech is used instead. To ensure 
a good quality this text-to-speech simulation is done similar to the approach 
followed by the concept of mechanical turk5 . 
5 User-Study 
The built prototype enables users to collaborate with each other within a virtual 
environment. But in order to deduct whether they actually bene?t from it, users 
have to be conducted within a user study. In general the participants are going 
5 
The concept of the mechanical turk was ?rst introduced with a fake automated chess 
playing machine in the late 18th century. https://en.wikipedia.org/wiki/The Turk 
(February 2018) It describes the process of faking a certain interaction or logic by 
the use of a human.
972 M. Petrykowski et al. 
to interact with an instructed user by either using the virtual reality prototype 
or the neXboard with its video functionality. This study should analyse the col-laboration 
of the participants with another user. It focuses on how the attendees 
behave di?erently in respect to the use of virtual reality or not. 
5.1 Personas 
The user group contains people that are already using the neXboard. These are 
a wide variety of people as meetings can be hold by developers, designers, scrum 
masters or executives. This study will mostly cover participants between 20 and 
30 since they would rather use new technologies due to their higher technology 
awareness. 
Creating a Similar Start Experience for Each Participant. As virtual 
reality devices are not widely adopted yet, most people would not have used a 
VR device yet. Therefore every participant would need to get an introduction to 
the technology. This ensures that they have a basic knowledge of interfaces for 
the virtual environment and know how they could interact with it. 
A great tool for this introduction is the oculus “First Contact”
6 
application. 
Within this application a robot instructs the user to grab, throw or pull di?erent 
objects from the environment. After this tutorial the user is able to deal with 
hand poses and to interact with the virtual environment. 
5.2 User-Study Setup 
The goal of this study is to show to which extend and in which particular use 
cases users bene?t from virtual reality in the context of whiteboard applica-tions. 
It focuses on the analysis of collaboration aspects, like the participants’ 
communication, the usage of the whiteboard or the resulting board. This allows 
to conclude whether user performed worse or better on certain tasks with or 
without virtual reality. 
To analyze a collaboration’s success, a session has to be performed by at least 
two persons. But to reduce the dimensions of di?erent variables that have to be 
taken into consideration when two users are tested at the same time, one user 
is going to be part of the study to be able to focus onto the other one. After 
the introduction with the oculus “First Contact” application the user is going 
to solve three di?erent tasks with either the neXboard and a video conference 
or the virtual reality prototype. The usage of the di?erent applications will be 
equally distributed among the participants and tasks which are described in the 
following Sect. 5.3: 
After the tasks were successfully ?nished the user ?lls out a short question-naire 
that is further described in Sect. 5.4. 
6 
See more on https://www.oculus.com/experiences/rift/1217155751659625/ (Febru-ary 
2018).
Digital Collaboration in Virtual Reality 973 
5.3 Tasks to Be Solved by Participants 
In the following three tasks are going to be outlined. For this study tasks with 
di?erent focuses are chosen. The tasks are going to represent di?erent situations 
that come up during a Design Thinking process. They are embedded into an 
overall ?ctional problem of planning the next holiday vacation. Although the 
tasks belong together and could be performed one after another, it is important 
that all tasks have the same start point to be able to compare the tasks against 
each other. This means, that the outcome of one task are not being used for one 
of the other tasks. Each of the task ful?lls following limitations: 
1. It can be solved in a reasonable amount of time of around 5 min. 
2. It should consist of elements that require to communicate with coworkers. 
3. It has a de?nition of when the task is being completed. 
Task One: Brainstorming. One basic task performed within the design think-ing 
process is to generate ideas. The ?rst activity is therefore to think about pos-sible 
places to go or activities to do. The participants are given three minutes 
to do so. They start from an empty whiteboard that just contains 2 sticky-notes 
that allocate a certain spot on the board for either the places or the activities. 
Task Two: Prioritizing. After successfully coming up with ideas the partici-pants 
face a board that contains sticky-notes with places and activities. The goal 
of this exercise is to bring those ideas into a prioritization of most interesting, 
interesting and not interesting ideas. All participants get the same items that 
should be prioritized. The board they work on contains 28 items that need to 
be agreed on. 
Task Three: Explanation. This task addresses the communication part and 
should help to understand whether it makes a di?erence for a user’s compre-hension 
to get things explained in virtual reality or through a video conference. 
Therefore the briefed user explains something to the other person. Afterwards 
the participant has to recap the explained things as detailed as he can. It is 
important to mention that this user knows beforehand that he has to explain it 
back. He is not restricted in any kind, so he can take notes on the board if he 
wants to. 
For this task the participants work on a board that contains rough orientation 
points of what is being said. To stick to the overall problem of this user study’s 
setup a certain city was chosen that is being introduced to the participant. 
5.4 Comparison of VR - Non VR 
For each of the tasks quanti?ed metrics are taken to objectively compare the 
performance of both systems. Each task has one major performance indicator 
which are:
974 M. Petrykowski et al. 
1. Number of created sticky-notes 
2. Time needed to bring ideas into order 
3. Amount of information that was repeated by the participant. 
Apart from that the user feedback as well the observations are being taken 
into consideration for the evaluation. 
User Feedback. The feedback is being gathered through a questionnaire. It 
is an important part of the study as it allows to get more insights about the 
user. The questions cover general information, the concept of neXboard and this 
paper’s prototype and possible experiences of sickness, as this could be a major 
issue for VR applications. 
6 Evaluation 
This sections takes the outcomes of the user study described in the previous 
section and sets them in relation to the question whether Design Thinking can 
successfully be implemented in virtual reality in comparison to the use of a 
digital collaboration tool without virtual reality, which is neXboard in this case. 
Fig. 10. Demographic values of partic-ipants. 
Fig. 11. Number of created sticky 
notes per participant. 
6.1 Participants 
Within the qualitative user study 17 participants took part. Figure 10 shows the 
demographic properties of the usergroup. Most of them are male and in their mid 
twenties. All come from di?erent backgrounds like designers, sound engineers, 
sales people, design thinkers, students and software engineers. This was impor-tant 
to get a heterogeneous feedback. On the other hand the common neXboard 
users also come from di?erent backgrounds. Especially when groups use the 
neXboard for design thinking they consist of multicultural team members. The 
usergroup was chosen to ?t this schema.
Digital Collaboration in Virtual Reality 975 
6.2 Task 1 - Brainstorming 
Task one as described in Sect. 5.3 consisted of creating sticky-notes with a brain-storming 
of three minutes. 
Figure 11 shows the number of created sticky-notes for each brainstorming 
session in virtual reality as well as without VR. It visualizes that on average more 
sticky-notes were created with participants that used the virtual reality device. 
That is an interesting outcome, as one crucial part within the brainstorming 
process consists of creating sticky-notes with content. One could assume that, as 
people have the possibility to use a keyboard while using the neXboard within 
a browser, the user would generate more content than in virtual reality where 
they can only use a speech-to-text input. 
But the complete opposite took place. The two highest outliers are 15 and 16 
sticky-notes within a session and were both created in virtual reality. Whereas 
the highest result without virtual reality has been 13 and came up only once. 
“Go for quantity” is one of Design Thinking’s basic concepts. It emphasizes 
that during the phase of idea generation it is good to have as much content as 
possible to build on for the later phases. 
All user were introduced to the possible tools they could choose from and 
knew about the possibilities to use them. However all of them used the sticky-note 
tool with its laser pointing ability to manipulate the whiteboard from a 
distance. Still some of them tried to interact through touch with the board 
but changed to pointing later on. One participant even used the pen to write on 
the board. On the other hand the participants using the neXboard used di?erent 
colors to di?erentiate between certain aspects, but did not change the tool during 
the whole exercise. 
Fig. 12. Amount of time needed for 
sorting ideas. 
Fig. 13. Number of points that were 
mentioned by the listening participant. 
6.3 Task 2 - Prioritization 
The second task deals with a brainstormed input and aims to bring the ideas 
into a priority. Figure 12 shows the amount of time that was needed to ?nish the 
ordering. The overall average was 3 min 41 s. Participants that used the Virtual 
Reality were slightly above this average with 3 min 52 s. Especially 2 participants 
needed around 6 min or more. Without virtual reality the user ?nished much 
faster with an average of 3 min 20 s.
976 M. Petrykowski et al. 
Most often the users followed the strategy to separate out those ideas where 
neither of them could imagine to actually use them. Afterwards they discussed 
on remaining ones. The huge di?erence in the timespans might not only be a 
matter of used technology but also dependent on the agreement of the discussing 
participants. 
According to the usage all virtual reality user made use of the ray casting 
interaction to change the sticky-notes position. Since the whiteboard itself had a 
size of roughly 2.5 × 3.5 m it is more convenient to use a distant raycaster than 
standing in front of the board and moving around all the time. One behaviour 
that all users shared, regardless of whether they used virtual reality or not, is 
that they focused on the whiteboards content and not on the other person. They 
still had the social presence of the other person, but the voice interaction was 
the most important communication channel within this exercise. 
6.4 Task 3 - Comprehension 
Task three covers the comprehension of concepts that are explained either 
through virtual reality or the neXboard with a running video conference. As 
shown in Fig. 13 the participants that had to use a video conference instead of 
the virtual reality prototype for this task were able to recall 11 points on average, 
which are roughly 4 points more than the other participants. They were able to 
remember 7.8 points on average, around 2 points less than the overall average of 
9.5. Therefore the participants using a videoconference performed better. The 4 
sessions with the lowest numbers of recognized points all come up with the use 
of VR. 
Studies [7] have shown that gestures and face expressions play an important 
role in supporting comprehension for people. That was one of the reasons for 
introducing avatars in virtual reality and a video conference for the neXboard. 
The explanation process of this exercise was accompanied by a board7 that con-tained 
the main points of what was said. Interestingly in virtual reality the par-ticipants 
mainly faced the board to follow the explanation instead of the talking 
person’s avatar. Therefore the participant’s digital representation seems not to 
be accurate or realistic enough to use it as a communication partner. Although 
the avatar conveys gestures and movements it is missing facial expressions. This 
is the main di?erence to the video conference, which allows both user to see each 
others gestures and facial expressions. According to the results the missing facial 
expressions seem to have a bad in?uence on a person’s comprehension. 
This task did not contain mandatory use of the board for the participant dur-ing 
listening and recalling the information. However three user of the neXboard 
created sticky-notes during the explanation phase to better remember the points 
afterwards. Even in virtual reality one participant tried to take notes by writing 
with the pen on the board, but could not hold up to the speed of explanation. 
This usecase shows a major issue of the text-input method in virtual reality. 
7 
See Sect. 5.3, Fig. 13.
Digital Collaboration in Virtual Reality 977 
Participants complained about the problem that they could not use the text-to-speech 
input during a conversation in which the other persons explains some-thing. 
The input technique for text therefore seems to be inconvenient while 
another user is still talking. 
Overall this task shows that the comprehension works currently better with-out 
virtual reality due to the missing facial expressions and the text-to-speech 
input which prevents users from taking notes during a conversation. 
6.5 Task Comparison 
Looking at the three tasks reveals that according to the implemented prototype 
not every use case seemed to be a good ?t for virtual reality. The biggest dif-ferences 
between both approaches are shown in task one where VR performed 
better and in task three where videoconferencing was a better ?t. For task two 
the results have shown that the neXboard actually worked better. However the 
di?erences between VR and non-VR were pretty small there. 
Fig. 14. Which task the participants 
found the easiest. 
Fig. 15. Which concept supported the 
user the most. 
As Fig. 14 shows, participants found task two the easiest to work on. Of all 
participants there was no tendency towards VR or non-VR. Eight participants 
found their easiest task to be done within virtual reality and nine voted for a 
task that they did only supported by a video conference. 
6.6 Comparing neXboard and Virtual Reality Prototype 
Independent of the task the participants were also asked to which extend they 
liked either the neXboard or the virtual reality approach and whether they think 
it would support their collaboration. 
Figure 15 shows that the attendees found the neXboard more supportive 
regarding the collaboration than the virtual reality prototype. The neXboard 
performs better than the virtual reality prototype. However the actual consent 
was pretty close. The VR approach was less supportive by only two points.
978 M. Petrykowski et al. 
That still shows a good performance of the VR application as it still is in a pro-totypical 
phase and not as ?eshed out as the neXboard. Overall those questions 
also state a high support by the two systems for digital collaboration. 
6.7 Comparing Interaction Approaches 
A tendency that could be observed all over was that participants ended up in 
using the ray-caster method for interaction with the whiteboard. A few user even 
started by using the touch interaction but changed during the task. There are 
a few reasons for that. On the one hand the size of the whiteboard with around 
2.5 × 3.5 m makes it cumbersome to use touch interaction. It enforces the user 
to move throughout the whole distance to move a sticky-note from one end to 
another. Due to the user’s body size he may not even reach every position on 
the board. On the other hand the touch interaction enforces the user to stand 
close to the board. This position does not allow him to have an overview of the 
whole board and makes it hard to see the whole picture. Both of this pain points 
are solved when standing at a distance and using the ray-caster. 
User feedback showed that 30% of the participants liked the touch interac-tion 
more. That is not surprising as using touch makes the prototype feel more 
natural. However, the provided touch interaction was not convenient enough to 
successfully work with it. Another reason for the dominance of the ray-caster 
interaction is the similarity to a mouse interaction on desktop devices. The par-ticipants 
are used to that and people usually like to stick to what they already 
learned since it does not require any additional e?orts. 
Apart from the interaction with the board the participants were also able to 
choose between di?erent tools through an interaction menu as shown in Sect. 4.3. 
CocoVerse [11]
8 
already proposed a method to open the menu by reaching down 
with one hand to the user’s waist level. This paper’s prototype tried out another 
gesture to see whether it works well too. Half of the participants liked this way 
to open the menu although all of them had to be taught ?rst to be able to 
execute the gesture. And some of them still struggled to open it. However, all 
of them liked that they were able to change the active tool wherever they are. 
This con?rms once more as Greenwald et al. [11] also stated that the approach 
of locality of tools in virtual reality makes sense for the user and seems to be 
important for successful VR applications. 
7 Future Work 
The introduced prototype throws up certain points that could lead to a major 
improvement. One important aspect comes up through the comprehension task. 
In order to successfully integrate a virtual reality application into a Design 
Thinking process people need to understand each other well. The digital avatar 
used in this paper is the right step towards that since it gives the user a sense of 
8 
Described in Sect. 2.
Digital Collaboration in Virtual Reality 979 
social presence of other people. However the task showed that not only gestures 
but also facial expression help to support a participant’s comprehension. There-fore 
it is de?nitively worth it to ?nd a way to derive those expressions from the 
spoken word, its voice level or the user’s image of his face. 
Another aspect is the cross virtual reality communication. The popularity 
of VR devices has just started so that there still are a lot of people without 
one. In order to promote the usage of VR applications such as the one proposed 
in this paper people need to be able to work together even if they do not own 
a device. Evaluating di?erent ways of how to integrate non-VR users into the 
virtual environment would make this applications more usable and compatible 
with heterogeneous groups. 
8 Conclusion 
In this paper, we introduce the impact of VR technologies for collaboration tasks, 
especially Design Thinking. By evaluating related work and diving deeper into 
the used interaction techniques, we elaborated on the best paradigms to select 
for our prototype that will support the collaboration for remote teams. 
The developed prototype was successfully capable of providing an immersive 
experience by combining interaction methods and collaboration aspects of tools 
like DIVE [10], CocoVerse [11] and neXboard. 
It showed in particular that participants preferred to interact with the white-board 
by using a ray-cast. This allowed them to get a better overview of the 
board. This paper also con?rms the approach for changing tools with a certain 
gesture proposed in the CocoVerse system by implementing a similar tool menu. 
Our user study has shown that the participants appreciate the virtual reality 
approach and feel that it is supportive for Design Thinking tasks at hand. Espe-cially 
brainstorming which is an essential part of Design Thinking show better 
results with the proposed VR approach than the digital whiteboard (neXboard) 
according to the number of generated sticky-notes. 
The digital avatars for each user had a positive in?uence on the communi-cation 
between the participants as it allowed them to see each other’s gestures. 
They were also able to see where the other users are positioned in the room and 
thereby are supported in tasks like comprehension and understanding. 
One interesting outcome provided by the user study’s feedback was that 
placing people into a virtual environment shields them from outer in?uences. 
They were able to focus on the task without being disturbed. This perception 
can be generated best in immerse environments. The proposed application ful?lls 
the four important points that make out immersiveness. Therefore the user’s 
feedback con?rms that those points actually make out an immerse feeling. This 
increased concentration also a?ects the results of the performed tasks of the user 
study. Brainstorming and prioritization of ideas work equally well and better 
than with non-VR approaches. 
The perceived immersiveness of the participants also con?rms that mod-ern 
web technologies are capable of providing a good virtual reality experience.
980 M. Petrykowski et al. 
The user were asked for motion sickness and showed that only two attendees felt 
slightly sick which is usual for people using VR for their ?rst time. Furthermore 
the precision of the interaction was also perceived as acceptable to work with. 
Comparing a 2D digital collaboration tool and the proposed virtual reality 
system of this paper shows that VR is useful for certain kind of tasks like brain-storming 
or prioritization. Tasks that contain comprehensive elements however 
still work better when done within a video conference. Both approaches can be a 
good surrogate for actual in person meetings that would otherwise be expensive 
according to time and money. Design Thinking among distances pro?ts from 
virtual reality especially if the application is web based and multiple users can 
join with every device they like. 
References 
1. Sutherland, I.E.: The ultimate display. Multimedia: From Wagner to virtual reality 
(1965) 
2. Sutherland, I.E.: A head-mounted three dimensional display. In: Proceedings of 
the December, 9–11 1968, Fall Joint Computer Conference, Part I, pp. 757–764. 
ACM (1968) 
3. Gumienny, R., Gericke, L., Quastho?, M., Willems, C., Meinel, C.: Tele-board: 
enabling e?cient collaboration in digital design spaces. In: 2011 15th International 
Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 
47–54. IEEE (2011) 
4. Wenzel, M., Gericke, L., Gumienny, R., Meinel, C.: Towards cross-platform 
collaboration-transferring real-time groupware to the browser. In: 2013 IEEE 17th 
International Conference on Computer Supported Cooperative Work in Design 
(CSCWD), pp. 49–54. IEEE (2013) 
5. Wenzel, M., Gericke, L., Thiele, C., Meinel, C.: Globalized design thinking: bridging 
the gap between analog and digital for browser-based remote collaboration. In: 
Design Thinking Research, pp. 15–33. Springer (2016) 
6. Plattner, H., Meinel, C., Weinberg, U.: Design thinking–innovation lernen– 
ideenwelten ¨o?nen, mi-wirtschaftsbuch (2009) 
7. Fussell, S.R., Kraut, R.E., Siegel, J.: Coordination of communication: e?ects of 
shared visual context on collaborative work. In: Proceedings of the 2000 ACM 
Conference on Computer Supported Cooperative Work, pp. 21–30. ACM (2000) 
8. Wenzel, M., Meinel, C.: Full-body webRTC video conferencing in a web-based 
real-time collaboration system. In: 2016 IEEE 20th International Conference on 
Computer Supported Cooperative Work in Design (CSCWD), pp. 334–339. IEEE 
(2016) 
9. Szalav´ari, Z., Schmalstieg, D., Fuhrmann, A., Gervautz, M.: Studierstube: an envi-ronment 
for collaboration in augmented reality. Virtual Reality 3(1), 37–48 (1998) 
10. Carlsson, C., Hagsand, O.: Dive a multi-user virtual reality system. In: Virtual 
Reality Annual International Symposium, pp. 394–400. IEEE (1993) 
11. Greenwald, S.W., Corning, W., Maes, P.: Multi-user framework for collaboration 
and co-creation in virtual reality. In: 12th International Conference on Computer 
Supported Collaborative Learning (CSCL) (2017)
Digital Collaboration in Virtual Reality 981 
12. Giraldi, G., Silva, R., Oliveira, J.: Introduction to virtual reality. LNCC Research 
report, vol. 6 (2003) 
13. Hand, C.: A survey of 3d interaction techniques. Comput. Graph. Forum 16(5), 
269–281 (1997) 
14. Bier, E.A.: Skitters and jacks: interactive 3d positioning tools. In: Proceedings of 
the 1986 Workshop on Interactive 3D Graphics, pp. 183–196. ACM (1987)
Teaching Practices with Mobile in Di?erent 
Contexts 
Anna Helena Silveira Sonego1 , Leticia Rocha Machado2 , 
Cristina Alba Wildt Torrezzan2 , and Patricia Alejandra Behar3(?) 
1 
UFRGS/PPEDU, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401, 
90040-060 Porto Alegre, RS, Brasil 
2 
UFRGS/PPGIE, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401, 
90040-060 Porto Alegre, RS, Brasil 
3 
UFRGS/PPGIE/PPGEDU, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401, 
90040-060 Porto Alegre, RS, Brasil 
pbehar@terra.com.br 
Abstract. This article aims to outline di?erent pedagogical strategies with appli- 
cations (apps) in the classroom. Every year the use of mobile devices like tablets 
and smartphones increases. At the same time, applications are being developed 
to meet this demand. It is therefore essential that educators investigate their use 
as a motivational technological medium that can possibly be used in the class- 
room. Apps can be used both as a source of information as well as a tool for 
creating material. Thus, this article will present the results of a study applying 
teaching strategies in di?erent contexts. It therefore, highlights the importance of 
mobile learning as a viable alternative in the classroom. In order to do so, there 
was a multiple case study in the undergraduate pedagogy program and a digital 
inclusion course for seniors both o?ered in the ?rst semester of 2017 at the Federal 
University of Rio Grande do Sul (UFRGS). Educational applications and exam- 
ples of teaching strategies using apps were created in these classes. Educational 
applications o?er the possibility to bring innovations to teaching practices, as well 
as new forms of communication, interaction and authorship, thus contributing to 
the process of teaching and learning. 
Keywords: Educational applications · Mobile learning · Teaching strategies 
1 Introduction 
The number of mobile devices being produced and o?ered to Brazilians increases every 
year. In “The Brazilian Media Study” [1] the cellular phone was ranked as the second 
means of accessing the Internet (66%), followed by the tablet (7%). This shows that 
Brazilians use phones for di?erent purposes, including to access digital Internet tools. 
There are di?erent reasons for this, including the quick learning curve to use these 
devices (mainly due to the interactive touch screen), mobility, fast communication and 
frequent updates. 
With this context in mind, it is important that the development of this new mode of 
communication and reasoning is also incorporated in the classroom to keep up with the 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 982–991, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_73
changes in society. Thus, one can prepare a subject to use mobile technology not only 
for entertainment, but also for educational goals and to meet their daily life needs more 
productively. 
One of the most commonly used tools on mobile devices are applications or digital 
resources designed to carry out certain tasks such as communicating, playing, creating 
text, etc. Currently there are about 1.43 million applications (apps) available on Google 
Play (https://play.google.com/store?hl=pt-BR), and 1.21 million at the Apple Store 
(https://itunes.apple.com/br/genre/ios/id36?mt=8) [2]. Yet there are few studies 
regarding the use and construction of applications in education. Therefore, this paper 
aims to present possible pedagogical strategies that can be used in the construction and 
use of apps in the classroom, involving examples of educational activities that have 
already been implemented at the Federal University of Rio Grande do Sul/Brazil. 
This paper is structured in six sections. The ?rst addresses the concept of mobile 
learning (Sect. 2). Then Sect. 3 describes the methodology used in this study. Next 
examples of the use and construction of educational applications supported by educa- 
tional strategies are presented in Sect. 4. Lastly, Sect. 5 presents the conclusions. 
2 Mobile Learning 
Currently, mobile technology is the increasingly used in di?erent sectors of society. 
Education, in turn, needs to be constantly updated in order to support its students. This 
brings new challenges in the educational sector, such as the Mobile Learning approach 
(M-Learning). 
M-Learning, incorporates the use of mobile technologies, separately or together with 
other Information and Communication Technologies (ICT) [3]. Thus, this type of tech- 
nology can provide students with possibilities to construct and improve knowledge at 
any time or place. According to [4], M-Learning can occur in situations where technol- 
ogies can o?er the student means to build their knowledge. However, a simple random 
use of a mobile device to perform an isolated activity in the classroom is not mobile 
learning. In order to be e?ectively understood as such, the teacher needs to integrate the 
use of technology with pedagogical planning that involves the study of content, teaching 
materials, implementation strategies and activities. 
In addition to supporting academic activities, this type of learning can also aid the 
interaction and communication among those involved in the educational process. 
According to [5], M-Learning provides opportunities to unite people in real and virtual 
worlds, creating learning communities among teachers and students. This occurs with 
the aim to integrate the process of teaching and learning with the use of mobile tech- 
nologies. Therefore, there is the need to create one or more teaching strategies to support 
this educational process, or a possible set of educational activities that can be applied 
according to the individual and/or collective needs of students [6]. One possibility is the 
use of applications in the classroom, which will be discussed below. 
Teaching Practices with Mobile in Di?erent Contexts 983
2.1 The Use of Educational Applications in the Mobile Learning Process 
Applications (apps), as described above, are programs designed especially for mobile 
platforms such as smartphones and tablets [7]. When used in the classroom, they can 
become an educational resource [8], capable of providing an innovative, dynamic, inter- 
active, collaborative and even playful knowledge building process. 
There are tools that allow teachers and students to build their own educational appli- 
cations. Some of these are available in free versions. For example, the Fábrica de Apli- 
cativos1 (http://fabricadeaplicativos.com.br/) enables the creation of applications for 
mobile devices in di?erent areas, o?ering a reasonable amount of features. 
This perspective contends that the construction and use of apps can be integrated 
into educational objectives, challenging educators and students and also prompting 
innovations in teaching and learning. In addition, app-building is a way to mediate 
learning with the use of mobile devices in the classroom. Therefore, instead of prohib- 
iting the use of these devices, pedagogical strategies must be created to bring the educa- 
tional environment closer to the current social reality. 
Hence, it is argued that teachers and students may gradually ?nd new ways to use 
applications. They will no longer be solely for entertainment, but increasingly used to 
solve everyday problems. Autonomy, collaboration and interaction are also motivated 
by this strategy, since students can take an authorial stance from the search for useful 
applications to their creation and sharing of this resource with the class. It is also a way 
to unite theory and practice, enabling the construction of meaning for the covered 
content. 
However, simply using applications is not su?cient to support educational goals. It 
is necessary to formulate pedagogical strategies that integrate the elements involved in 
the process of teaching and learning to promote quality education. Thus, the following 
sections will present some pedagogical strategies used for this study to address the crea- 
tion and use of applications in the classroom. 
3 Methodology 
This paper explores pedagogical strategies that can be adopted to create educational 
applications. The research is descriptive theoretical-practical, because it is dedicated to 
the (re)construction of ideas and improvement of principals related to studies of mobile 
learning and authorship. We used a qualitative case study to carry out this research. 
Therefore we used di?erent instruments and believe that the qualitative approach will 
contribute signi?cantly to meeting the objectives proposed in this study. Based on [9, 
p. 21], qualitative research is a science that is attentive to studies that cannot be quanti?ed 
and at the same time, can work “with the universe of meanings, reasons, aspirations, 
beliefs, values, and attitudes.” For this study, we chose a case study, because according 
to [10], it is a type of research that refers to phenomena, facts, and contemporary events 
that are part of our daily lives. According to the author, “the di?erentiating power of the 
study is its ability to handle a wide variety of evidence - documents, artifacts, interviews 
1 
Application Factory in English, a resource in Portuguese found at the site provided in the text. 
984 A. H. S. Sonego et al.
and observations - beyond what might be available in a conventional historical study” 
[10, p. 27]. Thus, a case study allowed us to evaluate how educational applications can 
enhance the teaching as well as student’s learning process. Moreover, based on M-
Learning, we investigated what capacities are needed for mobile learning. 
In order to meet the proposed objectives, the study was conducted in three recursive 
steps: 
(1) Construction of the theoretical framework of the themes: Mobile learning, mobile 
devices, educational applications, authorship. We studied the authors who deal with 
M-learning in di?erent contexts, both in conventional classrooms and in di?erent 
educational spaces (such as continuing education). Among the areas surveyed, 
those that stand out are: education, gerontology, design, information technology [3, 
7, 8, 11, 12]. 
(2) Planning and implementation of the classes: The intention was to plan and imple- 
ment teaching strategies that include the author’s development of educational 
applications. We used two types of public to develop and implement this study: 
undergraduate students in the Education Department at the Federal University of 
Rio Grande do Sul (UFRGS/Brazil) and elderly students in a digital inclusion course 
at the same university. The ?rst group was made up of 26 undergraduate students 
that were 18 years old or older, enrolled in a traditional classroom course in the 
Education Department at UFRGS/Brazil). The workload of the course was 45 h and 
students had to develop an application relevant to the areas of technology and 
education. The class was observed during the formal class times and in addition the 
students were given a survey questionnaire. It aimed to analyze opinions and 
expectations regarding the experience of M-learning in undergraduate education. 
The questionnaire was created on a site that contains this type instrument on the 
Internet, and was made available to students through the use of the virtual learning 
environment (ROODA – (in English the acronym corresponds to: Cooperative 
Network of Learning): http://ead.ufrgs.br/rooda). Based on the survey responses, 
statements were selected that best highlighted the construction of educational 
applications, relating the possible opportunities and challenges associated with the 
use of an app in the school environment. The answers made it possible to analyze 
information about the relationship between theory and practice in teaching and 
learning mediated by mobile devices. The alternatives of the questionnaire 
responses were organized based on the Likert scale, which, according to [13] (p. 
4), “requires respondents to indicate their level of agreement or disagreement with 
statements regarding the attitude being measured.” Thus, it was possible to conduct 
an analysis by triangulating the data, which is only possible when you have more 
than one source with di?erent information. Author in [14, p. 1142] states that trian- 
gulation happens, “when we use more than one approach to the investigation of a 
research question in order to increase con?dence in the results.” Therefore, the 
triangulation of data allows the expansion of the research, unlike when you have 
just one research procedure. The second group was the elderly, who were 60 years 
old or older, who participated in a digital inclusion course, o?ered at the same 
university. The class had a workload of 45 h, and all of the classes were in person. 
The goal was to build applications that show the main sights of the city in which 
Teaching Practices with Mobile in Di?erent Contexts 985
they lived. These two publics were chosen in order to make a comparison of the 
use of pedagogical strategies that allow the construction of ideas and production of 
knowledge through the authorship of materials for M-learning. 
(3) Development of educational strategies for the educational use of applications: This 
step was based on the theoretical framework and results obtained in the undergrad- 
uate course and continued education workshop. 
There were two data collection instruments used: (a) Participant observation; (b) 
Data collected through the productions in virtual learning environment features. The 
following section presents the trajectory and results of this research. 
4 Trajectory and Results 
The construction of educational apps in the classroom involved students in research 
(they had to research about the applications and themes for them). Moreover, they had 
to read, understand texts and write for their applications. Hence, this multiple case study 
involved two groups of students, an undergraduate pedagogy course and an extension 
course for seniors. This enabled analysis of how educational applications can enhance 
the teaching and learning of students through M-Learning. 
4.1 The Construction of Applications in a Pedagogical Undergraduate Course 
Building an application in the undergraduate course began by planning a group task, 
which was to design and develop an educational app. The themes were to be related to 
topics studied in class or about information technology in education, an issue closely 
linked to the subject of the class. It asked for the apps to present a theme (in the appli- 
cation description), suggestion of an educational app, application tips, examples from 
videos, photo album, audio, references and credits (authors). 
The activity began in the week that discussed the topic “Mobile Learning”, lasting 
for 14 more days (including distance learning). At the end of this time, students posted 
the application link in the virtual learning environment ROODA (in English the acronym 
corresponds to: Cooperative Network of Learning). This is a virtual environment plat- 
form for distance learning (https://ead.ufrgs.br/rooda/), which was used to plan and 
organize the “Media, Digital Technologies and Education” class o?ered in the ?rst half 
of the daytime pedagogy course o?ered in the ?rst semester of 2017 at UFRGS/Brazil. 
This application has provided support for this research. 
Examples of apps produced in this undergraduate course are presented below 
(Figs. 1 and 2). These applications can be accessed at the online address provided and 
installed on a mobile device. They are still available on the Internet and not through any 
speci?c mobile device app store. 
986 A. H. S. Sonego et al.
Fig. 1. Example of an application made by a student in the education class. Available at: http:// 
galeria.fabricadeaplicativos.com.br/repositorio_digital. 
Fig. 2. Example of an application made by a student in the education class. Available at: http:// 
galeria.fabricadeaplicativos.com.br/infoplan-turmab-midias#gsc.tab=0 
4.2 Construction of Applications in an Elderly Digital Inclusion Course 
Research about the use of applications by the elderly is still quite recent and there are 
few apps geared toward this population. Those available are primarily related to the 
health of the elderly (medication warnings, diabetes control, etc.). It is worth inquiring 
when education will produce applications and/or investigate teaching strategies that 
meet the elderly’s other needs (social, cultural, technological, etc.). Therefore, there is 
Teaching Practices with Mobile in Di?erent Contexts 987
a demand exists to create pedagogical strategies that can assist in the elderly’s critical 
development through, for example, authorship. 
The Digital Inclusion Unit (UNIDI) of the Federal University of Rio Grande do Sul 
(UFRGS) o?ered a distance learning/classroom workshop for seniors in 2014, called 
“Between cultures in southern Brazil: The elderly’s view of the city Porto Alegre.” The 
workshop lasted for ?ve months, with two hour weekly meetings. The goal was for the 
elderly to create applications to present the city where they live and the most interesting 
places to go and tourist sites for other seniors to visit. 
The virtual learning environment ROODA - Cooperative Learning Network, was 
used as a pedagogical strategy to develop these applications. In addition to communi- 
cation tools such as chat and a forum, this environment also provided support materials 
such as tutorials and a page with detailed lessons about the workshop (http://intercul- 
tura.weebly.com/). 
Each participant had the goal of creating an application about the city. Field trips 
were included in the classes so that participants could collect data on the region and also 
take pictures of the scenery. 
A total of 15 seniors participated in the workshop, with an average age of 67. 
However, only 5 applications were completed in the workshop by the elderly them- 
selves: Route of the POA tourist bus, Buildings in Porto Alegre, Landmarks of Porto 
Alegre - RS, Bus rides, and Gaucho legends. Figures 3 and 4, presented below, show 
examples of applications developed in this elderly digital inclusion course. All of the 
apps designed can be accessed through online address provided in each ?gure. 
Fig. 3. Example of an application developed by a student in the class for the elderly. Available 
at: http://galeria.fabricadeaplicativos.com.br/onibusturismopa 
988 A. H. S. Sonego et al.
Fig. 4. Example of an application developed by a student in the class for the elderly. Available 
at: http://galeria.fabricadeaplicativos.com.br/lendasgauchas 
4.3 Outline of Pedagogical Strategies 
The results of these strategies are seen point to the valid contribution of the creation of 
an educational application to building and sharing of information, knowledge and 
concepts collaboratively. Thus, it was taken into consideration the fact that the activity 
has been published on the ROODA (in English the acronym corresponds to: Cooperative 
Network of Learning) Webfolio in a format visible to all, enabling one to go to the 
address (URL) of the app developed in the class and extension course for seniors. More- 
over it allowed all students to view their peers’ work on their mobile devices. They could 
download the applications that they were interested in, about the theme and/or interactive 
content, providing a less linear reading, containing video, audio, images, links and 
others. 
It is possible to outline some pedagogical strategies that can assist in the production 
of applications in the classroom based on this research and experience: 
– Planning: In addition to outlining the objectives of the educational proposal, it is 
important to decide the subject of applications with the students so that they are 
involved and motivated to develop the apps. 
– Materials: It is important to plan a time to collect materials for the application. A 
class on how to collect materials (photo, images on the Internet; videos, etc.) is also 
necessary, as well as one on how to separate information into speci?c folders on the 
computer to ?nd it easier when it is time to create the app. 
– Features to create Apps: It is di?cult to ?nd features for building applications that 
are easy to use and are also in Portuguese. There are few tools for laymen. The strategy 
used in the two examples presented in this article is found in the Fábrica de Aplica- 
tivos (http://fabricadeaplicativos.com.br/). Although it is relatively easy to use on the 
computer, this feature limits the tools that can be included in the app. 
Teaching Practices with Mobile in Di?erent Contexts 989
– Copyright: It is very important to take precautions regarding copyrights on materials 
produced and applications. One must be extra careful, because these apps can be 
accessed and downloaded on mobile devices by anyone in the world. 
– Educational goal: Without an educational goal, applications provide little student 
involvement and can even be discouraging. The clarity of educational objectives in 
building the app, for the teacher as well as the student, is essential for the proper 
application of this technology. 
These were some possible pedagogical strategies that can be adopted by teachers at 
di?erent levels and types of education. There is still a great deal of research to be done 
and much to be proposed in this ?eld. However, it has been shown that the development 
of educational applications in the classroom is extremely compelling and challenging 
for students. It motivates them to continue learning and developing other applications 
of interest and can help them to acquire knowledge. 
5 Final Considerations 
This work has shown that the use and construction of educational applications as a 
pedagogical and authorial strategy is relevant. In fact, it has the potential to generate 
innovation in schools, o?er new and di?erent possibilities in the teaching and learning 
process, and help students to better understand content and information. 
Thus, mobile learning presents innovations and challenges for its implementation, 
such as connectivity, portability, ?exibility, autonomy of students and new forms of 
communication and interaction. In conclusion, mobile learning is now being developed. 
It is therefore still necessary to research and understand this tool and its possibilities in 
education. For future research will be developed and implementing the system by expert 
judgment or analysis with rubrics that allows observing the supposed bene?ts of digital 
devices in teaching. Hence, this article hopes to provoke re?ection on mobile learning 
in schools, aiming to collaborate by strengthening the related concepts and aid in the 
use and development of educational applications in the classroom. 
References 
1. Brasil. Presidência da República. Secretaria de Comunicação Social. Pesquisa brasileira de 
mídia: hábitos de consumo de mídia pela população brasileira. Secom, Brasília (2014) 
2. Tecmundo. Play Store passa App Store em número total de aplicativos e desenvolvedores 
(2015) 
3. Unesco. Policy Guidelines for Mobile Learning. Publicado pela Organização das Nações 
Unidas para a Educação, a Ciência e a Cultura (UNESCO), 7, place de Fontenoy, 75352 Paris 
07 SP, France. A tradução para o português desta publicação foi produzida pela Representação 
da UNESCO no Brasil (2013) 
4. Leite, B.S.: M-Learning: o uso de dispositivos móveis como ferramenta didática no Ensino 
de Química. Revista Brasileira de Informática na Educação 22(3) (2014) 
5. Batista, S.C.F.: M-learnMat: modelos pedagógicos para atividade de m-learning em 
matemática. 255 p. Tese. (Doutorado em Informática na Educação) Universidade Federal do 
Rio Grande do Sul, Porto Alegre, RS (2011) 
990 A. H. S. Sonego et al.
6. Behar, P.A.: Modelos Pedagógicos em Educação a Distância. Artmed, Porto Alegre (2009) 
7. Santos, F.M., de Freitas, V., S. F.: Avaliação da usablidade de ícones de aplicativo móvel 
utilizado como apoio educacional para crianças na idade pré-escolar. Ação Ergonômica. 
Revista Brasileira de Ergonômia 10(2) (2015) 
8. Bento, M.C.M., Cavalcante, R.S.: Tecnologias móveis em educação: o uso do celular na sala 
de aula. ECCOM 4(7), 113–120 (2013) 
9. Minayo, M.C.S. (ed.). Pesquisa social: teoria, método e criatividade. Vozes, Rio de Janeiro 
(2003) 
10. Yin, R.K.: Estudo de Caso: planejamento e métodos. Bookman Companhia Ed, Porto Alegre 
(2005) 
11. Doll, J., Cachioni, M., Machado, L.R.: As novas tecnologias e os idosos. In: Py, L. (ed.) 
Tratado de geriatria e gerontologia. GEN, Rio de Janeiro (2016) 
12. Yu, Z., et al.: Facilitating medication adherence in elderly care using ubiquitous sensors and 
mobile social networks. Comput. Commun. 65, 1–9 (2015) 
13. Bryman, A.: Triangulation. Encyclopedia of Social Science Research (2011). http:// 
www.sagepub.com/chambliss4e/study/chapter/encyc_pdfs/4.2_Triangulation.pdf. Accessed 
11 Sept 2017 
14. Brandalise, L.T.: Modelos de medição de percepção e comportamento - uma revisão, 
Florianópolis (2005). http://www.lgti.ufsc.br/brandalise.pdf. Accessed 21 Nov 2017 
Teaching Practices with Mobile in Di?erent Contexts 991
Accessibility and New Technology 
MOOC- Disability and Active Aging: 
Technological Support 
Samuel A. Navarro Ortega1,2 and M. Pilar Munuera Gómez1,2(&) 
1 
The University of British Columbia, Vancouver, Canada 
2 
Universidad Complutense, Madrid, Spain 
pmunuera@ucm.es 
Abstract. Covered in this paper are the notions of autonomy for the disabled 
and the elderly and universal access to new Information and Communications 
Technology (ICT). Assistance provided to the disabled has its roots in the 
human rights movement which proposes a new model to analyze the evolution 
of treatment for people with disabilities. A historical-critical analysis of the ideas 
and attitudes that for centuries have shaped the lives of people with disabilities 
reveals a situation of discrimination and social exclusion. In the case of Spain, 
for example, more than 4 million people lived with disabilities in 2009. 
Recently, many obstacles have been overcome in the process of bringing dis-abled 
people and the elderly into mainstream social life. In countries such as 
Canada and Sweden, legislation and public policies have ensured that disabled 
people and the elderly maintain quality of life, with full recognition of their 
rights as citizens. The paper concludes by presenting the MOOC Disability and 
Active Aging: Technological Support. This new educational initiative is being 
developed at Universidad Complutense in collaboration with international 
institutions. The aim is to inform the community about technologies whose 
application enhances independent living for persons with a functional diversity. 
A ?rst version of the course in 2017 attracted 3,334 participants from six 
continents and coming from a wide range of backgrounds and ages. Participants 
appreciated above all the methodological approach of the course. They stressed 
the broad perspective from which new technologies were discussed in order to 
promote independence and autonomy for the disabled and the elderly. 
Keywords: Accessibility
.h
Information and communications technology 
Disabilities 
1 Introduction 
New technological innovations are serving to enhance the learning processes and living 
conditions of communities of disabled individuals and the elderly [24]. Information 
and Communications Technology (hereafter ICT or IC technologies) should be 
understood as technologies that process, store and communicate information to single 
users and across groups of users [15]. Most importantly, these new technologies 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 992–1004, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_74
decrease the number of obstacles that disabled individuals face when applying for new 
jobs, enrolling in educational programs, or simply pursuing daily life activities. 
The mandate of the United Nations Convention on the Rights of Persons with 
Disabilities acknowledges that disability “results from the interaction between persons 
with impairments and attitudinal and environmental barriers that hinder their full and 
effective participation in society on an equal basis with others” [38]. Clearly, IC 
technologies cannot eliminate attitudinal barriers; they can however increase effective 
functionality. In other words, ICT can help ameliorate differences in effectiveness by 
allowing persons with a disability (i.e., a health condition) to function on a par with the 
rest of the community. 
By the early 1990s, the development of commercial networks and enterprises 
marked the beginning of the transition to the modern Internet [33]. In parallel with the 
sustained growth of Internet connectivity, there has been an influx of new products and 
services, some of which have directly bene?tted people with special needs. Echoing 
this developmental trend in technology, the United Nations Convention endorsed and 
motivated those involved in the research and design of new technologies to continue 
such efforts. More precisely, Article 4 of the Convention encourages States Parties “To 
undertake or promote research and development of, and to promote the availability and 
use of new technologies, including information and communications technologies, 
mobility aids, devices and assistive technologies, suitable for persons with disabilities, 
giving priority to technologies at an affordable cost” [8, 9, 38]. 
Article 9 of this same Convention advocates for the opportunity for people with 
special needs to live independently, engaging fully in all aspects of life. Likewise, 
States Parties are encouraged to “provide persons with disabilities access, on an equal 
basis with others, to the physical environment, to transportation, to information and 
communications, including information and communications technologies and sys-tems, 
and to other facilities and services open or provided to the public” [3, 4, 38]. Two 
concrete examples that address this mandate have to do with access to Internet services 
and to websites. Drawing on the increasing reliance on the Internet to access services 
(e.g., electronic mail, online banking, news groups, etc.), it is imperative that the 
community of the disabled and the elderly also be able to make use of them at an 
affordable cost. And because much online information is organized on websites, it is 
imperative that the sites consider users who might require assistance to navigate them. 
In the case of public administrations, they should mandate that their organizations 
design websites following explicit criteria of accessibility such as adjustable font sizes 
and color contrasts, to name a few. The idea is to make websites more user-friendly to 
the visually disabled or the hard of hearing as explained below. 
The jobsite for UK disability employment is a good example of a government 
organization responding to Article 9 of the United Nations Convention. The site offers 
an online interface designed to deliver “barrier-free” e-recruitment [12]. If a disabled 
person were looking for a position as a manager, the site offers 50 options to choose 
from. But what would happen if the jobseeker were visually impaired and could not 
read the ad for the job? The site is equipped with text-to-speech capability so that the 
visually impaired jobseeker can listen to the advertisement being read at a slow speech 
rate and clearly enunciated. But if the jobseeker were hearing impaired, the jobsite 
acknowledges that a face-to-face job interview would be more challenging than a 
Accessibility and New Technology MOOC 993
phone interview. Thus, the jobsite advises the jobseeker to follow tips for the hard of 
hearing such as use a hearing aid, sit as close to the interviewer as possible and face the 
person if feasible, ?lling in missing words from conversation so that concepts are 
understood [12]. Interestingly, an ad for a position as service manager in the home-building 
stated that the successful candidate would receive the latest IT kit (e.g., a 
Surface Pro tablet and smart phone), supported by the company’s new online integrated 
IT system. Therefore, the disabled applicant selected for the job would not have to be in 
the of?ce at all times. Instead, the employer offers the possibility of completing support 
plans on the go. 
Just as in the workforce, Information and Communications Technologies are also 
accommodating academic environments so that learners who live with a disability can 
feel more productive. In the case of low vision and visually impaired learners of an 
additional language (e.g., Spanish, English), there are several forms of technology that 
can enhance students’ learning experience. The market offers software packages that 
enable visually disabled learners to work with digitalized copies of language textbooks. 
Other software packages enable a blind student to dictate information to a personal 
computer and afterward have the computer read the material back to the learner. And 
there are even electronic devices capable of enlarging print materials so that visually 
disabled students can read handouts or flashcards. Even in the case of the movies and 
documentary ?lms that often enrich students’ exposure to a new language, voices off 
camera added to the audiovisual materials allow a visually disabled learner to follow the 
narrative being displayed [28]. Interesting to consider is that ICT technologies enhance 
the possibility of accommodating learners in and outside of the classroom, but without 
neutralizing the features that individualize each learner [14]. On the contrary, these 
supporting tools facilitate the academic integration of the community of visually dis-abled 
students as they continue to function in line with their own learning styles [29]. 
In short, technology has the potential to become the greatest ally of the disabled and 
the elderly, bringing them closer to functioning independently [10]. 
2 A Step Forward in Assisting People with Disabilities 
At present, the professionals and organizations that help persons living with a disability 
no longer use old, negative terms to talk about them [6]. Most importantly, disabled 
persons are no longer perceived as carriers of something harmful and who should not 
be seen. The traditional view has been replaced by the notion of rehabilitation, which 
in turn underscores the concept of autonomy. That is, a disabled person can achieve 
degrees of autonomy thanks to a process of rehabilitation [30]. This perspective of 
rehabilitation is rooted in the social movement that strives for the rights of persons with 
disabilities [38]. 
Worth mentioning is that a disability arises from an interaction between a fore-seeably 
permanent disability (i.e., a health condition) and contextual factors (e.g., 
environmental and personal) that may impede social integration. This de?nition 
stresses the idea of societal barriers (or obstacles) which hinder full assimilation of the 
disabled person. In this context, information and communications technologies are 
994 S. A. Navarro Ortega and M. Pilar Munuera Gómez
tools that help achieve autonomy. For the disabled person, this offers greater oppor-tunities 
to function on an equal footing with the rest of the society. 
In line with this reconceptualization of a disabled person and their functional role in 
society, [6, 34] our lexicon has also been going through a process of positive change. 
Gradually, pejorative and patronizing terms to label disabilities have yielded ground to 
more inclusive and non-negative language. For instance, we now acknowledge that a 
person who lives with a disability is someone who has a functional diversity [31, 32]. 
That is the disability signals an aspect that makes a person distinct from others, but by 
no means should this be considered an adverse feature (cf. people with special needs). 
Underlying the semantics of functional diversity, there is the idea of lack of respect 
that persons without functional diversity (i.e., “normal” or “healthy” individuals) 
sometimes display toward the cohort of disabled people. Social constructions and even 
environmental structures appear to disregard persons living with a disability [30]. This 
is the case with poor accessibility (e.g., lack of wheelchair ramps) in workplaces, 
recreational areas, or government buildings. It is not uncommon that the design and 
construction of public areas favors sophisticated architectural designs over function-ality. 
This ends up imposing mobility challenges for persons with physical functional 
diversity or those with a visual processing functional diversity. 
In short, stepping forward in assisting people with disabilities entails two major 
requirements. On the one hand, there is a need to obtain and maintain equal rights for 
the community of the disabled. On the other hand, we need to develop innovative 
theories that place value on the intrinsic dignity of people discriminated against due to 
their functional diversity. This is a topic that may lead into bioethics debates about their 
autonomy. 
3 Disability in the 21st Century 
According to the National Institute of Statistics, more than 4 million people in Spain 
lived with disabilities in 2009 [5]. In that same year, the Canadian national portrait of 
disability showed that about 4.4 million people — one in seven — had a disability, an 
increase from earlier that decade [15, 21]. In Sweden, about 1.5 million people have 
some type of disability [35], and if families are taken into consideration, the number of 
citizens who are directly involved increases exponentially. The question that comes to 
mind has to do with the way we have reached these many disabled people in the 21st 
century. 
Historical-critical analysis of the ideas and attitudes that for centuries have shaped 
the lives of people with disabilities largely ignores the path of suffering, discrimination, 
and social exclusion to which they have been subjected. This is certainly paradoxical if 
we consider that many problems related to a disability are often caused more by issues 
of a social nature than by the health condition per se. Oppressive reactions from 
individuals without a functional disability have been in many instances more detri-mental 
for disabled people than the psychomotor limitation with which they live [2]. In 
the introduction to the 2009 Federal Disability Report, former Human Resources 
Minister of Canada, Diane Finley, said “[t]he challenges people with disabilities face in 
their day-to-day lives are numerous and often go unnoticed” [20]. This realization has 
Accessibility and New Technology MOOC 995
been critical for understanding how to develop appropriate public policies. Fortunately, 
over the last few decades, this disquieting situation in which many persons with a 
disability live has been gradually improving, as mentioned in the previous section. The 
effort and commitment of a few organizations and government agencies have con-tributed 
to the overcoming of numerous obstacles, which in turn has brought about an 
integration of the community of disabled persons into mainstream society [11]. 
It is important to bear in mind that social changes concerning treatment of people in 
a state of dependency have resulted from the struggles and perseverance of the disabled 
people themselves and their families. Together, they have strived to achieve civil rights 
and social equality [21–23]. Likewise, important emphasis has now been assigned to 
the possibility that persons with a functional diversity are able to enjoy independent 
living [34]. In other words, the expectation remains that people who live with a dis-ability 
can in fact participate in society enjoying the fullness of their rights (e.g., make 
personal decisions such as getting married). 
The UNICEF World Report on Disability shows how “[p]olicy shifted towards 
community and educational inclusion and medically-focused solutions which have 
created more interactive approaches recognizing that people are disabled by environ-mental 
factors as well as their bodies” [39]. Some societies have promptly responded to 
these demands [3, 4]. This is the case with the 2005 Disability Discrimination Act in 
the United Kingdom of Great Britain and Northern Ireland. This Act led public sector 
organizations to further equality for persons with disabilities through initiatives such as 
introducing a corporate disability equality strategy or evaluating the potential impact of 
proposed policies and activities on disabled people [18]. Similarly, the Swedish gov-ernment’s 
disability policy aims to offer persons with functional diversity increased 
opportunities to participate in society on equal terms with others. The of?cial site of 
Sweden notes that the government has identi?ed several priorities of importance for 
disabled people, among which the justice system, transportation, and Information 
Technology (IT) are top priorities [35]. In the case of IT, the goal has been “to give 
people with disabilities a greater degree of independence”. Consequently, there is a 
pronounced emphasis on digital inclusion in Sweden’s national IT strategy [35]. 
In Canada, the federal government has been consistently working toward creating 
policies that ensure the well-being of an increasing number of disabled people. The 
of?cial site of the Government of Canada has a special section called Living with a 
Disability, whose aim is to inform citizens about the many services and ?nancial 
bene?ts available to assist people with disabilities and their families [20]. Likewise, 
users learn that Service Canada has a list of all the bene?ts (e.g., pension plan disability 
bene?ts, bene?ts for children, etc.) available for Canadians who are functionally 
diverse. The Canadian government, in line with the World Report, acknowledges that 
an aging population now has a much longer life expectancy. And the most common 
current types of disability are those caused by a natural process of psychomotor 
deterioration such as pain, and by mobility and agility issues [21]. 
In an effort to appreciate the hegemony of disability, it is fundamental to understand 
how the individualization of disability is interconnected across levels of society, in 
politics, in practice, and in personal experience. Considering these interconnections is 
vital in order to reformulate disability as an issue for society, and to develop a more 
appropriate understanding of political responses, professional practices and personal 
996 S. A. Navarro Ortega and M. Pilar Munuera Gómez
experience [32]. In what follows, we introduce an educational initiative that is being 
developed in Spain and which directly addresses the issue of social integration for all 
through new information and communications technologies. 
4 MOOC-Disability and Active Aging: Technological 
Support 
4.1 General Description 
The Massive Open Online Course (MOOC) Disability and Active Aging: Technolog-ical 
Support is an educational initiative developed at Universidad Complutense, 
Madrid, with collaboration from other national and international institutions. The 
course is available on the Miriadax platform https://miriadax.net/web/discapacidad-y-envejecimiento-
activos-soportes-tecnologicos at no cost. 
The primary objective of this course it is to offer information about technologies 
whose application enhances the life of persons living with a disability (e.g., indepen-dent 
living, autonomy, inclusion, accessibility). With its modular structure, the course 
allows students to complete one or more modules simultaneously, in any particular 
order, advancing at their own pace. Students learn that IC technologies have trans-formed 
production systems, creating a context of liberation and increasing competition 
in a globalized world. For the community of the disabled and the elderly, these new 
technologies represent a clear possibility of achieving social integration for the com-munity 
at large. Computers, microelectronics, multimedia, and telecommunications are 
examples of highly widespread information and communications technologies studied 
in the course. These technologies are largely available in homes, workplaces, and 
academic centres. 
The course contents are highly useful, and they may motivate the creation of new 
technological advances that continue to favor equal opportunities for the disabled and 
the elderly. This course, then, expects to promote both creativity and interest in the use 
of ICT, bearing in mind that its application should maintain respect for the individual 
and his or her rights [8, 9, 14]. 
Course participants learn how a correct application of IC technologies increases 
success in education; hence, institutions are able to more successfully accommodate 
disabled students and senior citizens. For instance, in Module 10, students learn about 
international experiences from Canada with respect to how this country deals with the 
issue of learning disabilities among the adult population. In particular, students learn 
that Canadian seniors with learning disabilities are far more successful at meeting their 
needs for aids and devices now than they were in 2001 (e.g., more than 56% of adults’ 
needs were fully met, compared with just 17.4% earlier) [20]. Conversely, younger 
Canadians and those with communicative disabilities experience problems in accessing 
the necessary aids and devices, for the most part due to cost [20, 21]. 
The course invites participants to critically analyze some of these technologies. 
Drawing on the notion of Individual Learner Differences, [14] it stresses the fact that 
each person has different skills and abilities, meaning that we all learn at our own pace. 
Bearing this idea in mind, the course emphasizes that the design and development of IC 
Accessibility and New Technology MOOC 997
technologies should consider individual differences [1, 14, 25]. What’s more, it 
emphasizes that technological support should be experimentally tested and integrate a 
diversity of users. The idea is that when these products become available in the market, 
the community of disabled users or the elderly will be able to draw on previous 
experience to inform them. With this idea in mind, students are expected to participate 
actively (e.g., joining discussion groups online and in class), to share information, and 
to discuss personal experiences, etc. The ultimate goal is to inspire new advances while 
maintaining a critical and analytical position. 
The team of instructors includes academics from Spanish and international uni-versities, 
as well as experts in disciplines such as disability, aging, new technologies, 
and social policies. A unifying characteristic of all the instructors is their interest in ICT 
that is capable of facilitating the social inclusion of disabled persons and the elderly. 
Students also receive ?rst-hand experience from people who employ assistive tech-nology 
to improve their quality of life. 
The MOOC is particularly useful for students of ICT Engineering, as they need in 
their technological designs to consider users who are disabled or face a natural aging 
process [8, 9]. Likewise, students in the Social Sciences, Law, and health-related ?elds 
bene?t from learning about improvements in living conditions thanks to the use of IC 
technologies [16]. Ideally, positive outcomes studied in the course will promote 
interdisciplinary start-up projects. 
4.2 Course Objectives 
The following are general objectives envisioned for the course: 
1. Raise awareness in the community about the advances that the use of IC tech-nologies 
offers to facilitate social participation and accessibility for persons with a 
functional diversity. 
2. Inform professionals and the community about the possibilities that IC technologies 
offer the elderly in order to maintain an active aging process. 
4.3 Course Modules 
The total structure of the course is comprised of 11 modules. Each module has its own 
set of contents and objectives. Students can complete the modules in any order and, as 
mentioned before, they can study more than a single module at a time. See Table 1 for 
a complete description. 
4.4 A Comparative View of Universidad Complutense’s MOOC 
and Those from Other Institutions 
The MOOC Disability and Active Aging: Technological Support, developed by aca-demics 
from Universidad Complutense, is certainly not the ?rst course nor the only one 
of its type. A quick search of the Web shows that several institutions are offering 
courses on disability such as MOOC: e-Learning inclusive [e-Learning Inclusive] [25] 
or accessibility through the use of ICT for customer and employees with disabilities 
998 S. A. Navarro Ortega and M. Pilar Munuera Gómez
[17] (see Accessibility MOOCs and Free Online Courses for more information) [27]. 
Likewise, academic forums such as the annual international congress on the theme of 
University and Disability [19] or the Closing the Gap 2018: 36th Annual Conference 
[7] gather researchers, academics, therapists, clinicians, and experts in disability and 
accessibility. 
It is certainly encouraging to see these numerous initiatives, as they all, in one way 
or another, bring the community of disabled people to the forefront of our discussion. 
Furthermore, they trigger the creation of new courses on the topic of disability. Clearly, 
educational institutions vary on the syllabus design and objectives envisioned for each 
course. For example, our MOOC offers a very practical review of what technological 
devices are currently available to enhance the social participation and inclusion of the 
disabled and the elderly. Also, it emphasizes the view of disabled individuals who have 
concrete realities and who support their lives with technology. The course does not 
focus on a single theme (e.g., issues of accessibility) as most other MOOCs do; instead, 
it presents eleven different but interrelated topics. The unifying thread of all the topics 
is the use of information and communications technology to promote the autonomy of 
the disabled and the elderly. Because the language of instruction is Spanish, Hispanic 
countries can access state-of-the-art information delivered in their ?rst language. 
Table 2 summarizes relevant comparisons between Universidad Complutense’s 
MOOC and a few programs with similar characteristics. 
Table 1. Modules and descriptors of the MOOC Disability and Active Aging: Technological 
Support. 
Modules Contents 
Module 0 Introduction and Course Presentation 
Module 1 Social Participation of Disabled People through Access to New Technologies 
Module 2 Accessibility to the Web 
Module 3 Public Policies for Persons with a Functional Diversity and for Active Aging 
Module 4 Social Intervention for the Elderly 
Module 5 Food and Nutrition for a Healthy Aging 
Module 6 Tele-assistance and Digital Home. Technological Support Tools for the Well-being 
of the Elderly 
Module 7 Alzheimer Patients: Social and Family Intervention, Memory Treatment. 
Software and Computing Applications 
Module 8 Social Services within the Context of Disability 
Module 9 Telemedicine 
Module 10 International Experiences 
Module 11 Occupational Therapy for Healthy Aging 
Accessibility and New Technology MOOC 999
4.5 First Version of the Course 
In January 2017, the MOOC Disability and Active Aging: Technological Support was 
taught for the ?rst time to a total enrollment of 3,334 students. Of these, 79.25% 
(N = 2,471) were students under 25 years of age (minimum age), whereas 0.9% 
(N = 9) were students 65 years and older (maximum age). The students’ backgrounds 
also showed great diversity. For example, 2.79% (N = 93) of the course participants 
were academics/researchers, 10.65% (N = 355) were university graduates, 5.55% 
(N = 185) were undergraduates, 0.87% (N = 29) had not yet begun university, 0.48% 
(N = 16) were university staff, and a good majority of participants 79.66% (N = 2,656) 
reported no activity. 
Geographical Distribution. The course attracted students from distant geographical 
locations. For example, 27.29% (N = 910) students were located in Spain, followed by 
an ample participation of Latin American students, (e.g., 6.64% (N = 207) from Chile, 
4.38% (N = 146) from Argentina, and 3.84% (N = 128) from Mexico). There were 
also students from European countries such as Portugal (N = 11), the United Kingdom 
(N = 4), and France (N = 5). From North America, there were 3 students from Canada; 
from Oceania, 2 from Australia; and from Eurasia, 2 from the Russian Federation. 
From East Asia, there was a student from Japan; and from Northern Africa, there was 
one from Morocco. As observed, the MOOC sparked interest from six of the seven 
continents of the world. 
Table 2. Comparison of Universidad Complutense’s MOOC and other MOOCs on IC 
technologies and the disabled. 
Universidad Complutense’s MOOC Other MOOCs 
Information and communications 
technologies are presented from a broad 
perspective in which they function as 
assistive tools to develop independent living 
for the disabled and the elderly 
New technologies are studied as assistive 
tools for speci?c objectives e.g., integrating 
disabled university students [13] or 
integrating disabled customers or employees 
[17, 26] 
There is an integral view of the different 
applications of IC technologies to improve 
the living conditions of disabled people. In 
particular, they enhance the development of 
autonomy 
Less emphasis is put on viewing IC 
technologies vis-à-vis the social participation 
of the disabled and the elderly. Instead, they 
explain how to design ICT that might be 
more accessible e.g., for education 
administrators, [1] e-Learning inclusive [25] 
In addition to new technologies, emphasis is 
put on topics that favor a better quality of 
living for the disabled (e.g., food and 
nutrition, occupational therapy for a healthy 
aging) 
Course syllabi are designed around a single 
topic e.g., how to adapt an online course for a 
person with cognitive disabilities, [25] the 
impact of transport systems on accessibility, 
safety regulations, etc. [37]. They also offer 
theoretical arguments to account for social or 
economic challenges imposed by longer life 
expectancy 
1000 S. A. Navarro Ortega and M. Pilar Munuera Gómez
Methodological Approach. The course disseminated information on IC technologies 
largely as aids that improve the quality of life of persons with a functional diversity. 
Unlike most MOOCs, ours adopted a comprehensive view of new technologies for 
people’s overall well-being. 
Instructional Materials. Instructors designed their own syllabi, which included 
objectives, materials, activities, and assessment criteria. Moreover, each instructor 
facilitated access to a broad range of materials formatted as downloadable pdf docu-ments, 
PowerPoint slides, or selected websites. Participants could also screen a series of 
educational video materials explaining, among other things, procedures for caring for a 
person in a situation of dependency, and applications and devices for blind or visually 
impaired people e.g., TapTapSee to photograph and describe objects for a user [36]. 
Team of Instructors. Instructors were academics, researchers and specialists working 
in a variety of ?elds (e.g., arti?cial intelligence, psychology, social work). They were 
af?liated with universities and institutions in Spain and abroad. 
Collaborative Participation. In alignment with the general objective of the course, 
students and instructors worked collaboratively. Students took advantage of their 
participation in the discussion board, where they exchanged information and held 
active discussions. Worth mentioning is the case of a Colombian student who prepared 
a video interview on one of the module topics, and showed it in an English-language 
class. 
Student Satisfaction. At the end of the course, instructors collected feedback from 
students. The aim was to learn about their satisfaction with the course, and receive 
suggestions that could help perfect it. This is well-illustrated in the case of a 
Venezuelan student who wrote the following: 
Al ?nalizar el curso deseo dar las gracias por el apoyo y el material compartido. Ciertamente 
lo aprendido me ha hecho ver de forma diferente el estilo de vida y las necesidades de los 
adultos mayores y las personas con alguna discapacidad, la sensibilización es muy importante. 
En Latinoamérica, especí?camente en mi país Venezuela hay mucho por hacer y crear, sin 
embargo, visualizar los avances de otros países lo reta a uno como profesional. Y. Y. M. M., 12 
de Junio, 2017. [At the end of the course, I would like to express my thanks for the support and 
the material that was shared. Clearly, what was learned has helped me see differently the living 
conditions and needs of the elderly and of persons with a disability, sensitizing is very 
important. In Latin America, especially in my country of Venezuela, there is a lot to do and 
create; however, viewing the advances in other countries challenges one as a professional.] 
5 Conclusion 
The social inclusion of the elderly and persons living with a disability has not always 
been a successful enterprise. The marginalization of those who, due a natural aging 
process, see their physical and cognitive abilities diminished has been far too common. 
Similar situations have been observed among the community of people who live with a 
disability. 
Accessibility and New Technology MOOC 1001
The development of new Information and Communications technologies is serving 
to consistently reverse societal exclusion. These technological advances, together with 
governmental actions and social policies, are ensuring, rightly, that the elderly and the 
disabled continue to be protected and acknowledged as valuable community members. 
The abovementioned academic initiative led by Universidad Complutense, in paving 
the way for widespread experimentation with and dissemination of IC technologies, 
plays an important part in our progress toward full social integration and independent 
living for persons with a functional diversity. 
References 
1. Administering School ICT Infrastructure: developing your knowledge and skills, European 
Schoolnet. https://www.mooc-list.com/course/administering-school-ict-infrastructure-developing-
your-knowledge-and-skills-european. Accessed 21 May 2018 
2. Alemán, C., Ramos, M.M.: Políticas para la Promoción de la Autonomía Personal y 
Atención a las Personas en Stuación de Dependencia [Policies for the Promotion of Personal 
Autonomy and Attention to People in Situations of Dependency]. In: Alemán, C. (ed.) 
Políticas sociales [Social policies], pp. 100–101. Civitas, Thomson Reuters, Pamplona 
(2009) 
3. Alemán, C., Alonso, J.M., García, M.: Servicios Sociales Públicos [Public Social Services]. 
Tecnos, Madrid (2011) 
4. Alemán, C., Alonso, J.A., Fernández, P.: Dependencia y Servicios Sociales [Dependency 
and Social Services]. Aranzadi, Navarra (2013) 
5. Casado, D. (ed.): Respuestas a la Dependencia. La Situación en España. Propuestas de 
Protección Social y Prevención [Answers to Dependency. The Situation in Spain. Proposals 
for Social Protection and Prevention]. CCS, Madrid (2004) 
6. Casado, D.: En Busca de un Sistema Conceptual para la Discapacidad [In Search of a 
Conceptual System for Disability]. In: Casado, D., García, J. (eds.) Discapacidad y 
Comunicación Social [Disability and Social Communication], pp. 29–40. 4ª edn. Real 
Patronato de Prevención y Atención a Personas con Minusvalía, Madrid (1998) 
7. Closing the Gap 2018: 36th Annual Conference. https://www.closingthegap.com/ 
conference/?utm_source=Google%20Display%20Network&utm_medium=Search&utm_ 
campaign=CTG%20search%20ads. Accessed 21 May 2018 
8. De Asis, R.: Ten guidelines for the correct interpretation of rights. Age Hum. Rights 1, 25– 
33 (2013) 
9. De Asis, R.: Ethics and robotics. A ?rst approach. Age Hum. Rights 2, 1–24 (2014) 
10. DeJong, G.: The Movement for Independent Living: Origins, Ideology and Implications for 
Disability Research. Michigan State University, Michigan (1979) 
11. De Lorenzo, R.: Discapacidad, Sistemas de Protección y Trabajo Social [Disability, 
Protection Systems and Social Work]. Alianza, Madrid (2007) 
12. Disability Jobsite - Supporting People with a Disability. https://www.disabilityjobsite.co.uk/ 
job/15074006/Service-Manager–Senior-Support-Worker. Accessed 8 May 2018 
13. Disability Awareness and Support (Coursera), University of Pittsburgh. https://www.mooc-list.
com/course/disability-awareness-and-support-coursera. Accessed 21 May 2018 
14. Dörnyei, Z.: The Psychology of the Language Learner: Individual Differences in Second 
Language Acquisition. Routledge, New York/London (2005) 
1002 S. A. Navarro Ortega and M. Pilar Munuera Gómez
15. Federal Disability Report: Advancing the Inclusion of People with Disabilities (2009). 
https://www.canada.ca/en/employment-social-development/programs/disability/arc/federal-report2009.
html. Accessed 9 May 2018 
16. Iáñez, A.: De la Exclusión a la Vida Independiente: Resultados de una Investigación con 
Personas con Diversidad Funcional Física en Sevilla [From Exclusion to an Independent 
Life: Results of an Investigation of People with Physical Diversity in Seville]. In: Capellín, 
M. J. (ed.) El Derecho a la Ciudad. Actas VIII Congreso de Escuelas, pp. 93–103. José 
Capellín, Gijón (2010) 
17. Information and Communication Technology (ICT) Accessibility (edX), Georgia Institute of 
Technology. https://www.mooc-list.com/course/information-and-communication-techno-logy-
ict-accessibility-edx. Accessed 21 May 2018 
18. Improving the Life Chances of Disabled People: Final Report. Prime Minister’s Strategy 
Unit, London (2005) 
19. La Universidad, Motor de Cambio para la Inclusión, IV Congreso Internacional Universidad 
y Discapacidad. https://ciud.fundaciononce.es/. Accessed 21 May 2018 
20. Living with a Disability, Government of Canada. https://www.canada.ca/en/employment-social-
development/services/bene?ts/disability/living.html. Accessed 17 May 2018 
21. More Disabled People in Canada: Report. http://www.cbc.ca/news/technology/more-disabled-
people-in-canada-report-1.825521. Accessed 9 May 2018 
22. Munuera, M.P:. Resolución de Conflictos. Promoción de la Autonomía desde la Mediación 
[Conflict Resolution. Promotion of Autonomy from Mediation]. Editorial Académica 
Española, Saarbrücken (2014a) 
23. Munuera, M.P.: Nuevos Retos en Mediación Familiar, Discapacidad, Dependencia 
Funcional, Salud y Entorno Social [New Challenges in Family Mediation, Disability, 
Functional Dependency, Health and Social Environment]. Tirant lo Blanch, Valencia (2014) 
24. Munuera Gómez, M.P., Navarro Ortega, S.A.: The Visually Disabled and the Elderly in the 
Age of IC Technologies. Nova Science, New York (2018) 
25. MOOC: e-Learning inclusivo [e-Learning inclusive], ASMOZ. https://asmoz.org/es/curso/ 
mooc-e-learning-inclusivo/. Accessed 21 May 2018 
26. MOOC: La discapacidad en el Mundo Laboral [Disability in the Workplace], Plataforma de 
Acción Social. http://www.plataformaong.org/noticias/1565/curso-mooc-la-discapacidad-en-el-
entorno-laboral. Accessed 21 May 2018 
27. MOOC List, Accessibility MOOCs and Free Online Courses. https://www.mooc-list.com/ 
tags/accessibility?title = ICT + for + persons + with + a+functional + diversity + &?eld_ 
start_date_value_op = between&?eld_start_date_value[value][date] = &?eld_start_date_ 
value[min][date]=&?eld_start_date_value[max][date]=&sort_by=?eld_start_date_ 
value&sort_order=DESC. Accessed 21 May 2018 
28. Navarro Ortega, S.: Technologies that help visually impaired spanish learners. In: Munuera 
Gómez, M.P., Navarro Ortega, S.A. (eds.) The Visually Disabled and the Elderly in the Age 
of IC Technologies, pp. 3–29. Nova Science, New York (2018) 
29. Navarro, S., Zebehazy, K.: Learn to Listen, Listen to Learn: What Can We Learn from 
English-Spanish Blind Bilinguals to Improve Listening Skills in L2 Spanish? In preparation 
30. Oliver, M.: ¿Una Sociología de la Discapacidad o una Sociología Discapacitada? [A 
Sociology of Disability or a Disabled Sociology?]. In: Barton, L. (ed.) Discapacidad y 
Sociedad [Disability and Society], pp. 35–48. Morata, Madrid (1998) 
31. Palacios, A., Romañach, J.: El Modelo de la Diversidad. La Bioética y los Derechos 
Humanos como Herramientas para Alcanzar la Plena Dignidad en la Diversidad Funcional 
[The Model of Diversity. Bioethics and Human Rights as Tools to Achieve Complete 
Dignity in Functional Diversity]. Diversitas-AIES, Valencia (2007) 
Accessibility and New Technology MOOC 1003
32. Palacios, A., Romañach, J.: El Modelo de la Diversidad: una Nueva Visión de la Bioética 
desde la Perspectiva de las Personas con Diversidad Funcional (discapacidad) [The Model of 
Diversity: A New Vision of Bioethics from the Perspective of the Person with Functional 
Diversity (disability)]. In: Ausín, T., Aramayo, R.R. (eds.) Interdependencia del Bienestar a 
la Dignidade [Interdependence of Well-being and Dignity], pp. 37–47. Plaza & Valdés, 
Madrid (2008) 
33. Peter, I.: So, who really did invent the Internet? The Internet History Project. http://www. 
nethistory.info/History%20of%20the%20Internet/origins.html. Accessed 8 May 2018 
34. Puig de Bellacasa, R.: Concepciones, Paradigmas y Evolución de las Mentalidades sobre la 
Discapacidad [Conceptions, Paradigms and Evolution of Mentalities regarding Disability]. 
In: Casado Pérez, D., García Viso, J.M. (eds.) Discapacidad y Comunicación Social 
[Disability and Social Communication], pp. 53–66. Real Patronato de Prevención y Atención 
a Personas con Minusvalía, Madrid (1998) 
35. Sweden’s Disability Policy. https://sweden.se/society/swedens-disability-policy/. Accessed 
16 May 2018 
36. TapTapSee. https://taptapseeapp.com/. Accessed 22 May 2018 
37. Transport Systems and Transport Policy: An Introduction, Hasselt University. https://www. 
mooc-list.com/course/transport-systems-and-transport-policy-introduction-hasselt-university 
. Accessed 22 May 2018 
38. United Nations Convention on the Rights of Persons with Disabilities. https://www.un.org/ 
development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html. 
Accessed 8 May 2018 
39. World Report on Disability, World Health Organization. https://www.unicef.org/protection/ 
World_report_on_disability_eng.pdf. Accessed 16 May 2018 
1004 S. A. Navarro Ortega and M. Pilar Munuera Gómez
Lecturing to Your Students: Is Their Heart In It? 
Aidan McGowan(?) , Philip Hanna, Des Greer, and John Busch 
School of Electronics, Electrical Engineering and Computer Science, Queens’ University, 
Belfast BT9 6AY, Northern Ireland 
aidan.mcgowan@qub.ac.uk 
Abstract. The measurement of cognitive activity using physiological means 
such as heart rate activity is a well-established research practice. Most previous 
studies have concluded that elevated heart rate occurs when an individual is 
cognitively engaged. However, there have been very few studies focusing on the 
e?ect in a learning environment. The recent proliferation of accurate, cheap and 
unobtrusive wearable devices with biometric sensors presents a new opportunity 
to perform a relatively inexpensive, natural, large scale study on the biometric 
e?ects on students during a series of lectures. This study presents the design and 
results of a unique two year study of students’ heart rate activity during a series 
of university computer programming lectures. It benchmarks student heart rate 
patterns during lectures and ?nds that there is a signi?cant correlation between 
elevated heart rates and module scores. To the best of the authors’ knowledge this 
type of live, natural learning environment study has not been reported before. 
Keywords: Heart rate · Programming · Cognitive learning, wearable devices 
1 Introduction 
The ability to measure mental e?ort under the stresses of varying cognitive workloads 
has been the subject of much research attention. High cognitive workload has been 
associated with high mental e?ort, a?ecting an individual’s ability to perform a set task 
[1]. Researchers have long been aware that there appears to be a ?nite limit on working 
memory in the human brain, with Millar [2] being one of ?rst to quantify mental load 
capacities. For most humans, going beyond that limit will result in a cognitive overload 
which will substantially interfere and inhibit their performance and learning ability [3]. 
The process of learning is complex and despite the volume of study it has received 
there exists a large number of contrasting de?nitions and learning paradigms used to 
describe it. Behaviorism, Cognitive Information Processing (Cognitivism) and 
Constructivism are just some of the relatively recent frameworks of principles 
attempting to explain how individuals acquire, retain and recall knowledge. While each 
paradigm di?ers in detail, most practitioners are in agreement that learning is a dynamic 
information processing and reasoning activity and that teaching needs to support active 
engagement. In third-level education, lecturing typically involves a range of teaching 
and learning activities designed to provide stimuli to facilitate learning. The information 
processing model [4, 5] suggests that once a stimulus is received and perceived the 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1005–1016, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_75
information is passed to the working memory in the brain where the mind ?nally 
becomes aware of it, potentially resulting in further processing [3]. 
Given the signi?cance of working memory and cognitive load to the ability of an 
individual to perform a task, including learning, a number of processes have been estab- 
lished attempting to measure an individual’s working memory, cognitive load and 
mental e?ort associated with a task. These techniques may be categorized as: perform- 
ance, subjective and physiological. The Stoop test is an example of a performance tech- 
nique where subjects are given a primary task and then concurrently asked to read a 
series of cards with the name of a color printed on the card but not in the color’s name. 
The response time of reading and error count in relation to an increasing level required 
for the primary activity are deemed the measurements of cognitive load. Subjective 
measures require subjects to self-report on task di?culty and mental e?ort required, 
examples include the NASA-TLK scale. 
However, in recent years physiological measurements of mental e?ort have grown 
in popularity. Common techniques include measurements in changes in blood glucose 
levels [6], blood oxygen saturation levels [7], blink rate [8], pupil diameter [9], galvanic 
skin response [10] and cardiac activity measurements including Heart Rate Variability 
[11] and numerous studies involving Heart Rate (HR). These procedures aim to measure 
biological changes that are thought to be caused by increased mental e?ort. Mental 
workload increases have been linked to a lowering of the parasympathetic (“rest” or 
“digest”) autonomous nervous system activity (ANS) and an increase in sympathetic 
(“?ght or ?ight”) activity [12]. Changes in the ANS can be measured by several phys- 
iological measurements, including Heart Rate, Heart Rate Variability and skin conduc- 
tance [13]. Veltman [14] associated elevated mental workload with increased arousal 
and neural activity which intensi?es metabolic demand and is the likely cause of 
increases in heart rate commonly observed in cognitive physiological studies [12]. 
Scholey [7] further reasoned that observed increases in HR during cognitive processing 
are the body’s facilitation of the delivery of metabolic substrates to the brain that are 
then utilized by neural mechanisms underpinning cognitive performance. Fairclough [6] 
frameworks the rational for this as the requirement for the mobilization of energy to the 
brain especially as the brain has substantial energy demands and also does not have the 
mechanism to store energy. Therefore, an increase in workload required by the brain 
appears to result in the consequential physiological changes observed, such as in HR 
increases. 
There have been substantial amounts of study over a lengthy period in the speci?c 
study of heart rate (HR) in relation to cognitive activity. HR is now the peripheral 
measure most used to assay e?ect and cognition [15]. As far back as the 1970s a series 
of experiments by Lacey [16, 17] demonstrated that tasks requiring increased cognitive 
processing are associated with HR acceleration. Furthered by Kaiser [18] who found 
that anagram solving also in?uenced HR; with the most di?cult anagrams producing 
the highest increases and the easiest producing the least. Numerous clinical experiments 
have been conducted measuring HR and cognitive tasks, including a reported increase 
in HR for computer gamers performing complex gaming tasks and by subjects 
performing di?cult mental arithmetic [19]. Abundant real world occupation studies 
have also reported similar HR increases due to increased cognitive task load, such as 
1006 A. McGowan et al.
for air tra?c controllers [20], ?ghter pilots [9] and university lecturers [21]. Increased 
memory load (number of items) was shown to be accompanied by accelerated HR [22, 
23]. Indeed, Cranford’s study [3] directly linked HR to varying degrees of cognitive 
load in problem solving and concluded that HR monitoring has further signi?cant 
potential use in measuring cognitive load during the learning process. Daly [24] 
concluded that there was a signi?cant positive relationship between heart rate and exam 
performance. Kohlisch [25] conducted a series of experiments measuring increasingly 
higher levels of mental workload on students during computing tasks. They concluded 
that at high mental load an individual will experience excessive mental task strain and 
that HR was a useful indicator of mental load. 
In terms of quantifying the increases in HR, Fredericks [26] found that HR was 
increased from resting heart rate when subjects were attempting the Stoop Test (by 
12.38%) and during an arithmetic calculating test (by 16.78%). This is a relatively 
common theme throughout literature with [27, 28] concurring that percentage HR 
increases from 10–20% with similar mental task loads. 
As such, there is much concurring literature to support the understanding that the 
cardiovascular system responds to cognitive stress [26] but HR activity is also in?uenced 
by many other factors. Anxiety is a potential in?uence on the heart rate measurements 
during a cognitive activity. Anxiety has been linked to physiological arousal [29] with 
a small amount being thought to be motivator to perform [30]. Up to a point this works 
until the levels become excessive and debilitates performance [31]. Luque-Casado [32] 
concludes that it is unclear how individual physical ?tness levels a?ects cognitive 
processing, however, regular exercise has been shown to elicit bene?cial changes in 
brain structures and therefore potentially cognitive performance as well as a lowering 
of resting heart rate. Other potential in?uences tested for signi?cance include age [33], 
blood glucose levels [6], arterial blood oxygen saturation levels [7], emotional levels 
[34], nutritional status [35] and personality types [23], with other in?uences such as 
gender and time of day of measurement having received little or no attention. To varying 
degrees all these studies report on the signi?cance of contributing factors in?uencing 
HR with cognitive activity; yet, all concur in design or conclusion that mental load 
activities are positively correlated with HR. 
As noted, there are many task situation, clinical, simulated and real world studies in 
this area; however, there have been a small number of studies in the use of HR as a 
measure of student cognitive engagement in university lectures. Bligh [36] carried out 
a series of classroom lecture studies showing that student HR decreased over the course 
of a 50-minute lecture. The decline in HR was interpreted as a measure of decreasing 
arousal, which Bligh considered as one component of cognitive engagement. In addition, 
Bligh reported a single event where a question from a student resulted in an elevation 
of HR in other students. Darnell [37] expanded on this work, and concurred with Bligh 
that there appears to be a decrease in average HR across a 50 min lecture class and a 
temporary increase in HR in response to student questions. In addition, they concluded 
that pair-share sessions resulted in elevated average HR. 
The devices used in most HR cognitive studies were generally expensive and obtru- 
sive. Consequently most of the studies su?ered from small sample sizes and limited 
sampling points. The prominent nature of the HR measuring device likely also a?ected 
Lecturing to Your Students: Is Their Heart In It? 1007
the results, with the students acutely aware throughout the experiment that their HR was 
being sampled. Anttonen [34] concluded the need for new methods for inconspicuous 
heart rate measurement. The recent proliferation of accurate, cheap and unobtrusive 
wearable devices with biometric sensors presents a new opportunity to perform a rela- 
tively inexpensive, natural, large scale study on the biometric e?ects on students during 
a series of lectures. 
2 Research Objectives 
This research study was designed to use wearable devices (Microsoft Band 2) to measure 
and record the heart rate activities of a large representative number of students during 
a number of lectures. This was to ?rstly establish a benchmarked understanding of 
student HR activities during lectures. The second part of the research was a focused 
study with a smaller number of students contrasting their resting heart rate (RHR) with 
that of their average heart rate during number of lectures (LHR). The analysis of the 
data sought to identify any general patterns of HR activity, to potentially relate these 
patterns to cognitive activities and to further check for any correlations with overall 
module attainment. To the best of the authors’ knowledge this scale of measurement 
and study in a live lecture environment has not been reported on before. 
2.1 Research Questions 
A. Is there a general decline in average HR of students over a length of a 50 min 
lecture ? 
The initial focus of this study was to build on and extend the understandings gained 
from [36] and [37] in relation to the general pattern of student heart rates throughout 
lectures. A baseline understanding of this student heart rate during lectures (LHR) is 
required to better analyse HR patterns in relation to individual teaching and learning 
experiences and may be relatable to cognitive activities. 
B. Is there an increase in student HR during lectures (LHR) in comparison with resting 
HR (RHR)? 
An analysis and comparison of the average RHR and average LHR, where average 
LHR is an average of recorded HR beats as measured per second over a 50 min period. 
This seeks to establish a baseline understanding of any repeatable di?erences that may 
be relatable to cognitive activities. 
C. Is there a correlation with HR and overall module attainment? 
There were three separate investigations within this theme: 
• Is there a correlation between the resting HR of students and ?nal module score? 
• Is there a correlation between the lecture HR of students and ?nal module score? 
• Is there a correlation between percentage variances between the resting HR and 
lecture HR of students and ?nal module score? 
1008 A. McGowan et al.
3 Methodology 
The study was conducted over two years with two di?erent cohorts of postgraduate 
students taking a 24 week compulsory module in Java programming in semesters one 
and two of a one year MSc course in Software Development. The research purpose and 
the methodology to be employed were explained to all the students before the course 
and volunteers were requested. A large number expressed an interest and willingness to 
participate and subsequently the students were chosen at random. Similar to other studies 
in the area any students that had diagnosed cardiovascular defects and smokers were not 
included and the students were asked to refrain from ca?eine intake one hour prior to 
the lecture. Additionally all HR recordings including the students recording of resting 
HR where taken at the same time of day. Each student was given a Microsoft Band 2 
wearable device and encouraged to wear it regularly. This was to ensure that a baseline 
RHR could be established for each student and also to lessen the potential in?uence on 
the results that measuring the HR during the lecture may have had. The HR for students 
was sampled per second. After each lecture each student in the study uploaded their HR 
data to a secure central server which was accessible to the researchers. 
The recording of the HRs was conducted over two stages. The initial stage was 
designed to establish the average HR of students during lectures. A total of 70 individual 
student HR recordings during 35 lectures of 50 min duration were recorded. There were 
35 students involved in this activity (male = 20, female = 15). 
The second stage focused on investigating if di?erences exist between RHR and 
LRH. A total of 15 students recorded several 50 min periods at rest. In order to establish 
their LHR average the same students then recorded their HR during 10 lectures. An 
analysis and subsequent comparison of the RHR and LRH was then conducted. 
4 Results and Discussions 
A. Is there a general decline in average HR of students over a length of a 50 min 
lecture ? 
The overall pattern of HR during lectures (Fig. 1) shows three distinct phases. The 
initial phase shows the average recorded HR at the beginning of the lecture was 75 bpm. 
This then decreases to around 71 bmp within the next 11 min. A mid phase follows 
which lasts from minute 12 to 48 with a reasonably constant average HR of 71 bmp with 
a slight dip to 70 bmp from minutes 28 to 34. The last phase shows a slight increase over 
the last 3 min. The relatively higher HR at the beginning is likely due to recent physical 
activity, with the students having just walked to the lecture theatre. The ?nal phase 
increase is likely due to the students readying themselves to leave at the end of the 
lecture. 
This research concurs with previous research [36–38] that there is an overall decrease 
in HR during the 50 min of the lecture. However, the decease is very slight and mainly 
due to the elevated HR at the beginning of the lecture due to the mild physical activity 
of the students’ walking to the lecture venue. 
Lecturing to Your Students: Is Their Heart In It? 1009
While the average general HR pro?le shows limited variation over time, this is not 
typical when observing an individual pro?le. As shown in Fig. 2, there is rarely a smooth 
pro?le, with frequent peaks and troughs observed throughout the lecture. Moreover, 
there are signi?cant variations between individual students during the same lecture 
(Fig. 3). 
Fig. 2. Typical individual lecture HR pro?le. 
Fig. 3. Individual HR pro?le of ?ve students during the same lecture. 
This suggests that there are signi?cant di?erences between the way students react to 
the same teaching stimuli. If indeed this is the case then it may be possible to measure 
and correlate these activities to expected responses in HR and cognitive activity. 
Fig. 1. Average overage HR pro?le over a 50 min lecture. 
1010 A. McGowan et al.
B. Is there an increase in student HR during lectures in comparison with resting HR 
(RHR)? 
Benchmarking RHR profiles enabled a comparison with the LHR. A focused 
study of 15 students (m = 9, f = 6) were initially targeted. Although the 15 students 
were involved in this phase the study reports on the 11 complete sets of data received 
(11 students, m = 7, f = 4). One of the students did not complete the course and three 
were unable to provide a complete set of data due to short term illnesses during the 
data recording period. The RHR and LRH measurements were found to be normally 
distributed, RHR (p = 0.986, Kolmogorov-Smirnov, Z = 0.453) and LRH (p = 0.999, 
Kolmogorov-Smirnov, Z = 0.382). The results shown in Fig. 4 illustrate that in every 
instance for each student their LHR was higher average than their RHR. 
Fig. 4. Individual Student HR at rest compared to HR during lectures. 
Using a Paired Samples T Test it was found that there was a statistically signi?cant 
di?erence between the mean RHR (64.44 SD = 4.36) and LRH (72.89 SD = 5.5)(p = 
0.002, t = –8.772, df = 10). The size of this di?erence in mean scores was found to be 
extremely strong (r = 0.94). This ?nding concurs with much of the previous HR cogni- 
tive e?ect studies that there is an increase in HR when an individual is cognitively 
engaged. 
C. Is there a correlation with HR and overall module attainment? 
• Is there a correlation between the resting HR of students and ?nal module score? 
Using a Pearson correlation test there was found to be no signi?cant correlation 
between RHR and module score (r = 0.146, p = 0.668). This ?nding partially supports 
a similar previous study [32] that baseline HR has no association with cognitive 
performance. Although the same study did report that sustained attention tasks were 
better performed by students with lower RHR. The ?ndings in the present study would 
suggest that RHR measurements on their own could not be used as a predicator for ?nal 
module attainment. 
• Is there a correlation between the lecture HR of students and ?nal module score? 
Using a Pearson correlation test there was found to be no signi?cant correlation 
between LRH and module score (r = 0.491, p = 0.125). The ?ndings in this study would 
Lecturing to Your Students: Is Their Heart In It? 1011
suggest that LHR measurements on their own could not be used as a predicator for ?nal 
module attainment. It is not possible to compare this outcome with previous studies as 
this measurement has not been made before. 
• Is there a correlation between percentage variances between the resting HR and 
lecture HR of students and ?nal module score? 
Similar to most other HR and cognitive e?ect studies the percentage variance in HR 
at lecture time in comparison with RHR was established for each student (Table 1). This 
is designed to help normalize the natural di?erences in baseline RHR between individual 
students for a more even comparison. 
Table 1. Individual RHR, LHR, variances between RHR and LHR and module scores achieved 
Student ID Resting HR (bpm) Lecture HR (bmp) Percentage di?erence 
in HR 
Module 
score 
s1 66.5 72.7 9.3 65 
s2 70.0 81.6 16.6 80 
s3 69.2 74.5 7.7 70 
s4 61.4 67.1 9.3 50 
s5 62.0 67.4 8.7 50 
s6 58.1 63.7 9.6 72 
s7 66.2 74.1 11.9 84 
s8 63.1 72.4 14.7 82 
s9 70.0 79.1 13.0 74 
s10 58.2 71.0 22.0 85 
s11 64.1 78.2 22.0 75 
The results show that there was an average percentage increase among all students 
in HR from RHR to LRH of 13.2%. Each student’s percentage variance increase in HR 
from RHR to LHR compared to individual module score is shown in Fig. 5 and demon- 
strates a positive correlation between the two variables. Additionally a Pearson analysis 
illustrates a fairly strong and positive correlation (r = 0.691, p = 0.042). 
Fig. 5. Module score achieved and percentage increase in HR during lectures for students. 
1012 A. McGowan et al.
This ?nding would suggest that those students that exhibit a higher percentage 
increase in HR from baseline achieved better results in the module. It concurs with 
previous studies that there is an increase in HR of 10–20% on task demand. It would 
also suggest that the students with the higher HR di?erences were more actively cogni- 
tively engaged at lecture time. While there are many factors involved in ?nal module 
mark, it would appear that being more activity engaged, as indicated by HR di?erences, 
could be a potential indicator for the level of module attainment. A linear regression to 
include the di?erence in HR was calculated to predict module score based on HR di?er- 
ences. A signi?cant regression equation was found (F(1,9) = 5.580, p = 0.042), with an 
R2 of 0.383. Students predicted scores is equal to 52.078 + 1.478 (Percentage increase 
in HR). It is also of note that students with the lower HR increase (around 6%) had a 
wide range of scores spanning from 50% to 70%. Whereas, the students that recorded 
higher HR increases (8–14%) consistently scored higher than 75%. This again is in 
agreement with other reported studies indicating that higher HR points to higher cogni- 
tive engagement. 
5 Conclusions 
The present study is unique in which it utilized non-intrusive wearable devices to 
measure HR in a natural learning environment. The ?ndings concur with much of the 
previous clinical and simulated studies of HR and cognitive e?ect showing an increase 
in HR under cognitive load. The ?nding that elevated HR at lecture time is correlated 
with higher module scores is potentially signi?cant. The current demand for computer 
science graduates has resulted in increasingly larger class sizes in universities. The 
challenges of e?ective delivery in these mass education environments are well docu- 
mented, with one of the recurring themes being high attrition rates, especially in 
programming courses [39]. Larger cohort sizes means there are obvious increased di?- 
culties in identifying struggling students. In the future the increasing common use of 
HR measuring wearable devices could be used as leverage to help in the early identi?- 
cation of such students. 
The research results would suggest that those students with a higher increased HR 
during the lectures are more cognitively engaged during these key learning contact 
points. The causality of the higher HR could be a myriad of in?uences including moti- 
vations to learn, interest in the subject material, the teaching styles employed, where the 
student is sitting in the lecture theatre [40] and previous in-term results achieved by the 
student. 
There are several limitations of the current design. Firstly, heart rate is a gross 
psychophysiological measure, yet it does have a proven large scale design research [41]. 
HR is by its nature an individual measurement and while the study attempted to control 
for variables such as ca?eine intake, time of day, subject age and health there are many 
other potential lifestyle and biological in?uences that may a?ect HR recordings. While 
the phase one part of this study was with large sample sizes the relatively small sample 
sizes for the analysis of module performance is a threat to the generalizability of the 
?ndings, however, many seminal HR studies such as (e.g. [16, 17, 41]) have been 
Lecturing to Your Students: Is Their Heart In It? 1013
conducted with similar sizes. Although it is worth noting that there is still a large 
sampling required, in this case the baseline study of HR involved over 7.35 million 
sample points and the focused study having 363,000 h sample points. Also the study 
concentrated in two cohorts of ?rst year programming students as such this also poten- 
tially restricts the generalizability of the ?ndings. 
While the study includes some students that were borderline passes it would also be 
of interest to study the HR of students that failed the module. A stated aim of the research 
was to measure as much as possible without being intrusive; this meant that the students 
were responsible for fully engaging with the logistics of experiment. They had to attend 
the lectures, ensure that they had the wearables fully charged and to subsequently upload 
the data. This process, even with enthusiastic volunteers, proved to be di?cult at times 
and limited the sample sizes for the focused study. The study is restricted to LRH of 
programming students and this also restricts the generalizability of the ?ndings. 
The authors suggest that future study in this area could readily be extended to include 
other disciplines. Additional experiment variable control or assessment may be possible, 
such as controls or consideration of age, stress, well-being, gender and previous subject 
knowledge. The individual LHR responses (Figs. 2 and 3) illustrate that there are signif- 
icant di?erences between individuals but also allude to some commonalities in response 
to teaching events. Checking if common increases and decreases are aligned to the 
learning activity would enable an analysis of the relative e?ectiveness of the various 
interactive and non-interactive teaching methods employed during the lectures. This has 
the potential to increase future student cognitive engagement and lecturer performance 
with the aim to increase overall student attainment. Indeed the LHR pro?le (numbers of 
signi?cant increases or decreases in HR) is also worthy of investigation, such as inves- 
tigating if there is a correlation with LHR pro?le during lectures and overall module 
attainment. 
References 
1. da Silva, F.: Mental workload, task demand and driving performance: what relation. Procedia 
– Soc. Behav. Sci. 162(2014), 310–319 (2014) 
2. Millar, G.: The magical number seven, plus or minus two. some limits on our capacity for 
processing information. Psychol. Rev. 101(2), 343–352 (1956) 
3. Cranford, K., Tiettmeyer, J., Chuprinko, B., Jordan, S., Grove, N.: Measuring load on working 
memory: the use of heart rate as a means of measuring chemistry students’ cognitive load. J. 
Chem. Educ. 91(5), 641–647 (2014). https://doi.org/10.1021/ed400576n 
4. Axelrod, R.: Schema theory: an information processing model of perception and cognition. 
67(4), 1248–1266 (1973) 
5. Mayer, R.: Multimedia Learning, 2nd Ed. Cambridge University Press, New York (2009) 
6. Fairclough, S., Houston, K.: A metabolic measure of mental e?ort. Biol. Psychol. 66(2), 177– 
910 (2004) 
7. Scholey, A., Moss, M., Neave, N.: Cognitive performance, hyperoxia, and heart rate 
following oxygen administration in healthy young adults. Physiol. Behav. 67(5), 783–789 
(1999) 
8. Beatty, J., Lucero-Wagoner, B.: The pupillary system. In: Cacioppo, J.T., Tassinary, L.G., 
Berntson, G.G. (Eds.) Handbook of Psychophysiology, pp. 142–162 (2000) 
1014 A. McGowan et al.
9. Wilson, G.: An analysis of mental workload in pilots during ?ight using multiple 
psychophysiological measures. Int. J. Aviat. Psychol. 12(1), 3–18 (2002) 
10. Mirza-babae, P., Long, S., Foley, E., McAllister, G.: Understanding the contribution of 
biometrics to games user research. In: Proceedings of the 2011 DiGRA International 
Conference: Think Design Play DiGRA/Utrecht School of the Arts DiGRA 2011, vol. 6, 
January 2011. ISBN/ISNN: ISSN 2342-9666 
11. Thayer, J., Hansen, A., Saus-Rose, E., Johnsen, B.: Heart rate variability, prefrontal neural 
function, and cognitive performance: the neurovisceral integration perspective on self-regulation, 
adaptation, and health. Ann. Behav. Med. 37, 141–153 (2009) 
12. Brouwer, A., Zander, T., van Erp, J., Korteling, J., Bronkhorst, A.: Using neurophysiological 
signals that re?ect cognitive or a?ective state: six recommendations to avoid common pitfalls. 
Front. Neurosci. 9, 136 (2015) 
13. Berntson, G., Bigger, J., Eckberg, D., Grossman, P., Kaufmann, P., Malik, M.: Heart rate 
variability: origins, methods, and interpretive caveats. Psychophysiol. 34(6), 623–648 (2007) 
14. Veltman, J., Gaillard, A.: Physiological workload reactions to increasing levels of task 
di?culty. Ergonomics 41(5), 656–669 (1998) 
15. Guerra, P., Sánchez-Adam, A., Miccoli, L., Polich, J., Vila, J.: Heart rate and P300: integrating 
peripheral and central indices of cognitive processing. Int J Psychophysiol. 100, 1–11 (2015). 
https://doi.org/10.1016/j.ijpsycho.2015.12.008 
16. Lacey, J., Obrist, B., Black, P, Brener, A., DiCara, L.: Studies of heartrate and other bodily 
processes in sensorimotor behaviour. In: Cardiovascular Psychophysiology, Aldine, Chicago 
(1974) 
17. Lacey, J., Lacey, B., Black, P.: Some Autonomic-Central Nervous System Interrelationships 
Physiological Correlates of Emotion. Academic Press, New York (1970) 
18. Kaiser, D., Sandman, C.: Physiological patterns accompanying complex problem solving 
during warning and non-warning conditions. J. Comp. Physiol. Psychol. 89, 357–363 (1975) 
19. Turner, J., Carroll, D.: Heart rate and oxygen consumption during mental arithmetic, a video 
game, and graded exercise: further evidence of metabolically-exaggerated cardiac 
adjustments. Psychophysiology 22, 261–267 (1985) 
20. Wilson, G., Eggemeier, F.: Physiological measures of workload in multi-task environments. 
In: Damos, D. (Ed.) Multiple-task Performance, pp. 329–360. Taylor and Francis, London 
(1991) 
21. Filaire, E., Portier, H., Massart, A., Ramat, L., Teixeira, A.: E?ect of lecturing to 200 students 
on heart rate variability and alpha-amylase activity. Eur. J. Appl. Physiol. 108(5), 1035–1043 
(2010) 
22. Backs, R., Selijos, K.: Metabolic and cardiorespiratory measures of mental e?ort: The e?ects 
of level of di?culty in a working memory task. Int. J. Psychophysiol. 16(1994), 57–68 (1994) 
23. Pearson, G., Freeman, F.: E?ects of extraversion and mental arithmetic on heart-rate 
reactivity. Percept. Motor Skills 72, 1239–1248 (1991). https://doi.org/10.2466/pms. 
1991.72.3c.1239 
24. Daly, A., Chamberlain, S., Spalding, V.: Test anxiety, heart rate and performance in a-level 
french speaking mock exams: an exploratory study. Educ. Res. 53(3), 321–330 (2011) 
25. Kohlisch, O., Schaefer, F.: Physiological changes during computer tasks: responses to mental 
load or to motor demands? 39(2), 213–224 (1996) 
26. Fredericks, T.K., Choi, S.D., Hart, J., Butt, S.E., Mital, A.: An investigation of myocardial 
aerobic capacity as a measure of both physical and cognitive workloads. Int. J. Ind. Ergon. 
35(12), 1097–1107 (2005). https://doi.org/10.1016/j.ergon.2005.06.002 
27. Ettema, J.H., Zielhuis, R.L.: Physiological parameters of mental load. Ergonomics 14(1), 
137–144 (1971) 
Lecturing to Your Students: Is Their Heart In It? 1015
28. Hitchen, M., Brodie, D.A., Harness, J.B.: Cardiac responses to demanding mental load. 
Ergonomics 23(4), 379–382 (1980) 
29. Daly, A., Chamberlain, S., Spalding, V.: Test anxiety, heart rate, and performance in A-level 
French speaking mock exams: an exploratory study. Educ. Res. 53, 321–330 (2011) 
30. Hardy, L., Beattie, S., Woodman, T.: Anxiety-induced performance catastrophes: 
investigating e?ort required as an asymmetry factor. Br. J. Psychol. 98, 15–31 (2007) 
31. Hopko, D., McNeil, D., Lejuez, C.W., Ashcraft, M., Eifert, G., Riel, J.: The e?ects of anxious 
responding on mental arithmetic and lexical decision task performance. J. Anx. Disorders 
17, 647–655 (2003) 
32. Luque-Casado, A., Zabala, M., Morales, E., Mateo-March, M., Sanabria, D.: Cognitive 
performance and heart rate variability: the in?uence of ?tness level. PLoS ONE 8(2), e56935 
(2013) 
33. Mukherjee, S., Yadav, R., Yung, I., Zajdel, D., Oken, B.: Sensitivity to mental e?ort and test-retest 
reliability of heart rate variability measures in healthy seniors. Clin. Neurophysiol. 
122(10), 2059–2066 (2011). https://doi.org/10.1016/j.clinph.2011.02.032 
34. Anttonen, J., Surakka, V.: Emotions and heart rate while sitting on a chair. In: Proceedings 
of the SIGCHI Conference on Human Factors in Computing Systems CHI 2005, pp. 491– 
499. ACM, New York (2005) 
35. Lieberman, H., Farina, E., Caldwell, J., Williams, K., Thompson, L., Niro, P., Grohmann, K.: 
Cognitive function, stress hormones, heart rate and nutritional status during simulated 
captivity in military survival training. Physiol. Behav. 165, pp. 86–97 (2016). https://doi.org/ 
10.1016/j.physbeh.2016.06.037. ISSN 0031-9384 
36. Bligh, D.A.: What’s the Use of Lectures? Jossey-Bass Publishers, San Fransico (2000). Or 
Intellect Books (1998). Originally published in 1972. Boucsein, W.: Electrodermal activity. 
Plenum Press, New York (1992) 
37. Darnell, D., Krieg, P.: Use of heart rate monitors to assess student engagement in lecture. 
FASEB J. 28(1) Supplement 721.25 (2014) 
38. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording. Oxford University 
Press, New York (2001) 
39. McGowan, A., Hanna, P., Anderson, N.: Computing gender wars — A new hope. In: 2017 
IEEE Frontiers in Education Conference (FIE), Indianapolis, IN, USA, pp. 1–8 (2017). https:// 
doi.org/10.1109/?e.2017.8190480 
40. McGowan, A., Hanna, P., Greer, D.: Learning to program: choose your lecture seat carefully! 
In: 2007 IEEE Proceedings of the 2017 ACM Conference on Innovation and Technology in 
Computer Science Education ITICSE 2017. ACM, New York, 03–05 July 2017 
41. Raine, A., Venables, P., Mednick, S.: Low resting heart rate at age 3 years predisposes to 
aggression at age 11 years: evidence from the Mauritius child health project. J. Am. Acad. 
Child. Adolesc. Psychiatry. 36(10), 1457–1464 (2007) 
1016 A. McGowan et al.
Development of Collaborative Virtual 
Learning Environments for Enhancing 
Deaf People’s Learning in Jordan 
Ahmad A. Al-Jarrah(B) 
Applied Science Department, Ajloun University College, 
Al-Balqa Applied University, Ajloun 26816, Jordan 
aljarrah@bau.edu.jo 
Abstract. In this research, we aim to address the problem of combin-ing 
the bene?ts of using eLearning environments and let students work 
together in groups in a collaborative virtual learning environments. In 
this paper, we explain the development of an establishment of a col-laborative 
model for deaf people to support collaborative learning. The 
created model is an extension of traditional collaborative learning mod-els 
in class rooms; it allows multiple deaf students to work together in 
a virtual environment to discuss the learning materials, solve problems, 
transfer knowledge, etc. We start by extending the collaborative learning 
model, by introducing explicit roles for each member in the group. The 
model provides a social space for deaf people to communicate using sign 
language. Moreover, it is implemented in a collaborative virtual environ-ment 
that provides di?erent types of features to support learning over 
distance. It allows deaf students to communicate and share knowledge 
easily. Bilingual chatting (text and sign language), video conference, and 
avatar feature are the main three features in the proposed model. 
Keywords: e-Learning 
·
Collaborative learning 
Collaborative virtual environments 
·
Deaf people’s learning 
1 Introduction 
In the last few decades, computer technologies was ?ourished rapidly and is used 
in most of the life ?elds. The foundation of these large amounts of technologies 
aims to enhance, facilitate and make human life easier. Deaf people learning 
aspect as other aspects are deeply involved in scienti?c research. But, building a 
good healthy educational environment for deaf people where they need speci?c 
requirements according to their situation is not an easy thing. Moreover, the 
large number of disabled individuals requires more e?orts to support them. They 
have the right to receive services as normal people, and they have full right to 
have a good education exactly as the same level of opportunities as a normal 
peer [4,9,13]. 
.s
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1017–1028, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_76
1018 A. A. Al-Jarrah 
1.1 Motivation 
The community education is facing a number of issues that are related to enhance 
the deaf people education using technologies [19]. These issues are related to a 
number of factors which cause a leek of participation of deaf people in learning 
environments. Factors can di?er from country to country, such as; the available 
resources, culture, funds, etc. 
According to the World Federation of the Deaf, the number of deaf people 
around the world is around 70 million [22]. The number of deaf people in Jordan 
is over 8000, according to Higher Council for A?airs of Persons with Disabilities 
(HCD) statistics till 2015 [14]. HCD takes care of people that have disabilities 
where they provide di?erent types of services (e.g. education, medical care, train-ing, 
etc.). Also, the number of special schools were established to provide a good 
education for disabled people. These schools were prepared to be good environ-ments 
to teach deaf people. Unfortunately, the increased number of schools in 
the last few years still cannot meet the demand of providing a required education 
for deaf people in Jordan. 
Unfortunately, the number of disabled people that are attending these schools 
is still limited. The percentage of people with disabilities that attended schools 
in 2011 is only 13% of all people that have disabilities. Although that the number 
of schools increased after 2012, the percentage of people that attended schools 
decreased over the next three years, the percentages in 2012, 2013, and 2014 are 
12.2%, 8.7% and 6.0% respectively [15]. The somber numbers of disabled people 
who are attending schools can be attributed to a variety of factors, e.g., there 
are no schools close to disabled people, there is no transportation to schools, 
there is not enough funds to support schools to increase there ability to accept 
more students, etc. 
Collaborative learning is a methodology where two or more students interact 
and collaborate during the learning process [8]. Collaborative learning activities 
have been recognized as increasing technical abilities; the collaborative aspect 
allows students to explore learning materials with the backing of a social infras-tructure, 
which creates a ‘safer’ and less threatening learning environment. At 
the same time, the collaborative aspect introduces a level of accountability and 
supports exploration of the learning materials from di?erent perspectives and 
interpretations. Collaborative learning has been shown to be particularly impor-tant 
to engage traditionally underrepresented groups. Creating a collaborative 
virtual learning environment for deaf people will let students interact with each 
other more easily, transfer knowledge better, and enhance learning output. 
1.2 Brief Literature Review 
In this research, we found that there are a number of researches that study 
the usage of technology in learning environments for deaf students. These stud-ies 
focus on di?erent side of learning for deaf students. The common goals for 
these researches are; engage more people in educational environments, develop 
the learning environments, enhance the learning outcomes for the engaged deaf
Collaborative Virtual Learning Environments (CLVED) 1019 
students, etc. One of the learning environments which is used widely in learn-ing 
is e-learning environment. It has a wide range of usage and lets di?erent 
types of students–including deaf or hearing impaired students–to be engaged in 
learning [13], therefore, complete their study and pursue an advanced degree in 
di?erent majors. 
Di?erent types of studies focus on e-learning enhancement to support deaf 
students with required services that encourage them and enhance their learning 
level [5]. Drigas et al. [9] presents a study of an e-learning system which aim to 
provide videos for Greek sign language side by side with each text block in the 
learning environment. The proposed system provides di?erent type of services for 
deaf learners to support the learning environments, bilingual information (text and 
sign language), high level of visualization and video conferencing. Kyun et al. [18] 
proposed an e-learning system to support blind and deaf students to study together 
side be side with normal people. 
The ?rst research in Jordan was done by Khwaldeh et al. [17] where they 
addressed the interactivity issue in a deaf class room. The study aims to facilitate 
and enhance learning for deaf people using a centralized based learning system. 
The system provides required videos for learners with a good quality that can be 
transferred over Internet with the deliver of all details related to sign language 
movements, where the movements will be clear enough to be recognized by deaf 
students. 
Chowdhuri et al. [6] proposed a virtual classroom for deaf people which aims 
to provide learning facilities for deaf people. The proposed system has a number 
of functional and non-functional components, which in all aim to provide the 
deaf students with required material for the course (chapters, ppts, assignments, 
etc.) in sign language. 
Bouzid et al. [4] study the e?ectiveness of using 3D human avatars to pro-vide 
educational services for deaf students. The study aims to investigate the 
e?ectiveness of signing avatar technology on SignWriting vocabulary acquisition 
and the comprehensive of deaf students. The results show an actual di?erence 
between the scores of learners without using the avatar and the scores of learners 
after using a signing avatar. 
Real Time Arabic Sign Language Translation System (RTASLTS) is a real 
time system that works as a translator between deaf and normal people [10]. 
RTASLTS consists of three steps; video conference, pattern construction and 
discrimination, and text and audio transformation. The system was tested on a 
database with 700 gestures. The evaluation results show that the system was able 
to translate from Arabic sign language into Arabic text and sound recognition 
with a rate of 97.4%. According to the research conclusion, the system can be 
used to support the communication between deaf and normal people. 
1.3 Summary of Contribution 
Collaborative learning is a social act where participants talk among themselves, 
listen to di?erent perspectives, articulate and defend their ideas, make a conver-sation 
between learners, etc. Students work in pairs or small groups to achieve
1020 A. A. Al-Jarrah 
shared learning goals in collaborative learning. Cooperative learning, team learn-ing, 
or group learning are other names for collaborative learning [3,20]. 
Learning is an approach to process and synthesize information, not just sim-ply 
memorize and repeat it [12,20]. In a collaborative environment, the learner 
actively engages with his/her peers. The diversity in the group members (e.g. 
background and viewpoints) cause the learners to gain a number of bene?ts 
while working with others on the common tasks [11]. Moreover, learning in such 
environments can be ?ourished in a social environment as the team members 
have conversations for di?erent topics. Also, learners in the collaborative learn-ing 
environment are challenged both socially and emotionally while they listen to 
di?erent perspectives [11,20]. Collaborative work requires a special experience in 
addition to let the learners gain more skills and experience such as defend their 
ideas at the same time that they listen to di?erent perspectives. As a result of 
that, learners have to start creating their own unique conceptual framework. 
According to the previous discussion, collaborative learning can be de?ned as 
two or more students working together and sharing the workload equitably as 
they progress toward intended learning outcomes. 
Collaborative learning model consists of ?ve basic elements [7,11,12,20]; face-to-
face interaction, positive interdependence, individual accountability, profes-sional 
skills, and group processing. These ?ve elements determine the interaction 
between members where face-to-face doesn’t necessarily mean to have a meeting 
in one place for the group. The group can work together over distance using a 
speci?c provided collaborative environment or using any general communication 
technology (e.g. phone, Skype, email, Google hangout, etc.). The main purpose 
of this elements is that the collaborative environment should have an interac-tive 
channel between the learners in the same group. Where, the second element 
focuses on the success of the whole group as one part and cannot divided over 
members, the success of one member means the success of all group members. 
This goes hand-in-hand with individual accountability, where each member is 
responsible of his/her individual task. 
The fourth element which about professional skills that each member should 
have or gain while work within the group. Working within a group will encourage 
the members and help them to develop and practice trust-building. Moreover, 
It helps students to practice a number of important skills, such as; leadership, 
decision- making, communication, and con?ict management skills. The group 
should have a methodology to monitor itself to be sure that the whole group is 
working together e?ectively. 
Collaborative learning strategies and structures can be used to determine 
the group shape, members roles, or the collaboration steps. Think-Pair-Share, 
Three-Steps Interview and Pairs Check are an examples of collaborative learning 
structures [3,16,20]. The discussion in Think-Pair-Share goes through four steps; 
listen, think, pair and share. The instructor posts a question, everyone listens 
carefully before making any response, each team member takes their time think-ing 
about the answer, team members pair with neighbors to discuss the response, 
and ?nally share the answer with the whole class. The three-step interview
Collaborative Virtual Learning Environments (CLVED) 1021 
technique is used usually as an ice-breaker technique or as a team-building exer-cise. 
It starts by pairing students and letting one student interview another, 
then they switch roles. After that, a group of four members builds by joining 
two pairs. Each member of the group introduces his/her partner, highlighting 
the most interesting points. 
In Pairs Check, students are grouped in teams with four students in each. 
Each group is organized in two subgroups of two students each. Each pair works 
on solving a problem on a worksheet. One student works on solving the ?rst 
problem, where the partner has a coach role. The coach encourages his partner 
and o?ers an exaggerated praise while s/he is solving the problem. After solving 
the ?rst problem, they switch roles. After solving two problems, the four students 
in the team check each others’ work. At the end, if the team agrees on the 
solution, they will announce it. 
Fig. 1. The core components of the CVLED model. 
2 CVLED Model 
The Collaborative Virtual Learning Environment for Deaf people (CVLED) 
model is created to encourage deaf people to attend classes and continue their 
study wherever they are. Figure 1 shows the core components of the CVLED 
model. These components are combined together to create the new model by 
merging the bene?ts of using these components in teaching deaf students. Some 
students in Jordan face issues that prevent them from attending schools. This 
model can help them in solving this issue by attending class over distance, where 
they don’t need to be at school physically. Moreover, The model allows the stu-dents 
to collaborate together to solve the posted questions by the instructor. 
The instructor generates random groups with four students in each. Students 
follow a speci?c steps to solve the problems and announce their solutions to the 
whole class.
1022 A. A. Al-Jarrah 
2.1 Overview 
The proposed collaborative virtual learning environment is designed to extend a 
two collaborative learning models (three-steps interview and pairs check). The 
original models were used in classes to let students cooperate together to solve 
the posted problems in the class by instructor. The new model merges these 
two model and extend them to be used to learn students over distance. Figure 2 
shows the four main steps in CVLED model and how the two merged models are 
used in the new model. Deaf students’ schools for the deaf can use the model to 
connect with deaf students and allow them to attend classes over distance. Each 
student receives a user name and password after registration to allow him/her 
to enter to the system and attend classes. In each class, instructor and students 
follow the model steps to work together to ?ourish the learning environment. 
Fig. 2. CVLED model steps with the two merged models. 
The Four CVLED’s Steps. The system allows the instructor to publish the 
course materials and questions for students, and allows instructor and students to 
create a group study to have a discussion about the materials. In this discussion, 
students use a number of provided features (e.g. free hand writing white board, 
video conference, avatar interpreter, etc.) to communicates and solve the posted 
problems. Moreover, the CVLED model supports the instructors to make sure 
that the learning process is implemented completely and all students achieve 
the learning tasks that they have to do. S/he works as a monitor for the whole 
class, evaluates each student contribution in the group, and ensures that the 
learning outcomes are met. Actually, s/he has to be sure that the ?ve elements 
of collaborative learning are implemented.
Collaborative Virtual Learning Environments (CLVED) 1023 
Instructor and students follow the following four steps to learn and solve the 
posted questions (Fig. 2): 
1. Creating groups and assign roles: Students work together to solve the 
posted problems by instructor in groups, where each group has four students. 
The instructor generates groups by using any popular technique (e.g. ran-domly, 
balanced background group, balanced gender group, etc.). The group 
members work together in the next steps to solve the posted problem. In 
general, the groups are dynamic, where the instructor regenerates them each 
class, problem, or course materials. 
2. Three-Step Interview: It is used as we mentioned before as an ice-breaker 
between group members. It will be used each time the instructor regenerates 
groups. The group uses the model as the following: 
• 
The group is divided into two pairs. 
• 
Each pair uses the model in two rounds. In the ?rst round, the ?rst partner 
in each pair interviews his/her partner. – This step is an important step 
to let the group members know each other and feel comfortable while 
they work on solving problems. 
• 
In the second round, the members’ roles are switched in each pair, the 
second member interviews his/her partner. 
• 
After the two pairs ?nish the two rounds, the two pairs join together to 
build the group with four members. 
• 
Finally, each member of the group introduces his/her partner, highlighting 
the most interesting points. 
In this step, the group uses video conference or avatar interpreter to commu-nicate, 
in addition to text chatting. 
3. Pairs Check: The team of four students works in this step on solving prob-lems 
where the four students are organized again in two teams of two students 
each. Each pair works on solving the problem using the provided features (e.g. 
white board, video conference, text chatting, etc.). The problem is divided 
into two sub-problems. Each pair starts by solving the ?rst sub-problem by a 
student, where the partner has a coach role. The coach encourages his partner 
and o?ers exaggerated praise while s/he is solving the problem. After solving 
the ?rst problem, they switch roles. After solving two sub-problems, the four 
students in the team check each others’ work. At the end, if the team agrees 
on the solution, they will announce it. 
4. Announce results: The whole class is build again by joining all groups 
together. The instructor recognizes students’ accomplishment. S/he gives each 
group members a time to announce their ?nal results. The class discusses each 
group results in the mean time, and after all groups ?nish, the instructor 
announce the ?nal results. The whole class celebrates their accomplishments, 
and the instructor asks each student to thank his/her partner for his/her 
contribution.
1024 A. A. Al-Jarrah 
2.2 Implementation 
In the previous section, we presented CVLED model which enhances deaf stu-dents’ 
learning and encourages more students to attend schools. In this section, 
we present the main functionality and features that the CVLED system has and 
how it interacts with di?erent types of users. 
(a) Send button. (b) Video Conference button. 
Fig. 3. Ex. The send and video conference buttons with sign language and Arabic 
language caption. 
The graphical user interface (GUI) of CVLED. The GUI of CVLED system con-tains 
the main features that provide the users with all required functionality. It 
was designed as simple as possible to let students use it easily. At the beginning, 
each user is asked to enter his/her username and password to login to the sys-tem. 
We added a new capability to the system that allows the student to write 
his/her name or the password in sign language to login to the system. After the 
user login, the main window appears, where it is designed with the three main 
components: 
1. Menu bar: It has the main menu items that facilitate the usage of the system 
by students and instructor. The menu options are changeable according to the 
user type (student, instructor, or administrator) and according to the work-ing 
phase on the problem. The instructor has the capability to add/remove 
user, post course material or question, monitor the work progress via log 
?le, etc. Students use the options to start video conference, text chatting and 
other features that ?ourish the learning environments and facilitate the group 
communication. 
2. Toolbar: It is a core component which could be useful to accelerate the access 
of features where it can be accessed by the menu bar. The toolbar components 
are designed as friendly graphical user interface for deaf students (e.g. buttons 
are designed with sign language captions (e.g. Fig. 3 shows two buttons (send 
and video conference) with Arabic and sign language captions)). One of the 
main buttons in the toolbar is a button for video conference call which allows 
the group members to start a video conference. The video conference call has 
two options; open the webcam or use an avatar instead of using webcam – this 
option is added according to the deaf students requirements in Jordan where 
they don’t like to appear in the webcam conversation. The second important
Collaborative Virtual Learning Environments (CLVED) 1025 
button allows the user to open a whiteboard, it is a free hand space to write 
or draw any shape to support the group discussion. The whiteboard is shared 
between the group members which supports collaborative works between stu-dents 
where they can have a discussion over distance as they site around a 
table in a class room. The whiteboard can be used by the instructor also to 
explain any of the course material. 
3. Workspace: It is the largest space of the main window. The workspace is 
divided into three main components: (1) The collaborative tools space is 
designed to contain one of the collaborative tools at a time, it supports the 
collaborative learning for the whole class or within each group (e.g. white-board), 
(2) video conference space shows the list of all class members’ web-cams. 
At a time, one of the users webcam is activated and enlarged. If the 
user chooses to use an avatar instead of his/her captured video, the system 
transfers the user hands’ moves or translates a text to sign language which 
re?ects the avatar’s hands. (3) Text or sign language chat area which is used 
to support the discussion of the whole class or the group’s members, it can 
be used in one of four modes; text to text, text to sign, sign to text, or sign 
to sign. In general, the appearance of components depends the current step 
(e.g. the whole class works together, individual group works, etc.). 
In the workspace area, the instructor can show the learning materials by 
publishing text and ?gures side by side of a video with sign language which 
explains the material contents. 
User Information. The required information about users are kept in an XML 
database. The system asks the user to enter his/her username and password to 
login to the system. Therefore, this information in addition to other informa-tion 
such as, ?rst name last name, phone number, address, etc. are saved in the 
database. Also, the system keeps the main data about groups and each member 
role in the group. The system keeps the transaction for each user in a log ?le. 
The log ?le is important to support the assessment for students which can be 
done by evaluating each student’s contribution [1,2]. 
Application-Level Communication Mechanisms and System Architecture. The 
system is designed to provide a synchronous view of the shared collaborative 
learning tools (e.g. whiteboard) on the instructor and students workstations 
which are connected through the Internet. A communication layer is required 
to support the exchange of events among the students in the class or the group 
members. RabbitMQ [21] is used to realize the transfer of events between stu-dents. 
RabbitMQ is an open source message broker software, that provides a 
reliable method to send and receive messages. The Advanced Message Queu-ing 
Protocol (AMQP) implemented in RabbitMQ is a suitable protocol needed 
in our project. The group members’ sessions are connected by message queues; 
each team member will use one queue to receive messages from the other group 
members, or s/he can broadcast a message for a queue de?ned for the whole 
group.
1026 A. A. Al-Jarrah 
Fig. 4. Collaborative Virtual Environment for CVLED system. 
The overall architecture of the system is a standard client-server architecture 
(see Fig. 4). Each user in the system executes a local client. The local client 
presents an interface to the user similar as CVLED on all other connected users. 
The system server enables the interactions between the di?erent CVLED sessions 
and maintains synchronization between the local views of the virtual workspace 
being manipulated. 
3 Conclusion and Future Work 
We are working to ?nish the ?rst prototype of CVLED; it is scheduled to be 
ready in May 2018. According to the scheduled ?nish time of the prototype, the 
evaluation of the system is scheduled to be used in the academic year 2018/2019. 
The ?rst round will start at the beginning of fall semester 2018. The evaluation 
will be done in two rounds; In the ?rst round, students will be divided into two 
subgroups; the ?rst group of students will continue to use the old method for 
learning, as they can attend schools or receive the learning materials as before. 
The second group, will use the system which allows them to attend the classes 
and contribute in the learning environments over distance (from home, works, 
etc.). The second round is scheduled to be in the second semester 2018/2019. 
In this round, the two previous groups will be switched. In the two rounds, the 
students will be asked to answer a pre-survey before the beginning of the round 
and a post-survey after the end of each round. In addition to the four surveys, the 
students contribution and learning outcomes will be used to assess the CVLED 
system. 
The other modi?cation that we are working on is to extend the model to have 
more collaborative strategies (e.g. round table, Jigsaw, etc.). This modi?cation 
allows the instructor to form the groups according to the implemented strategy 
which makes the collaboration between students more ?exible. The group size in
Collaborative Virtual Learning Environments (CLVED) 1027 
the ?rst prototype depends on the two implemented strategies (Three-Step Inter-view 
and Pairs Check), where in both the group size is four. The modi?cation 
allows the instructor to shape groups with di?erent sizes. 
References 
1. Al-Jarrah, A., Pontelli, E.: “alice-village” alice as a collaborative virtual learning 
environment. In: Frontiers in Education Conference (FIE), pp. 1–9. IEEE (2014) 
2. Al-Jarrah, A., Pontelli, E.: On the e?ectiveness of a collaborative virtual pair-programming 
environment. In: International Conference on Learning and Collab-oration 
Technologies, pp. 583–595. Springer (2016) 
3. Barkley, E.F., Cross, K.P., Major, C.H.: Collaborative Learning Techniques: A 
Handbook for College Faculty. Wiley (2014) 
4. Bouzid, Y., khenissi, M.A., Jemni, M.: The e?ect of avatar technology on sign writ-ing 
vocabularies acquisition for deaf learners. In: 2016 IEEE 16th International 
Conference on Advanced Learning Technologies (ICALT), pp. 441–445 (2016). 
https://doi.org/10.1109/ICALT.2016.127 
5. Canal, M.C., Garc´ia, L.S.: Research on accessibility of question modalities used in 
computer-based assessment (CBA) for deaf education. In: International Conference 
on Universal Access in Human-Computer Interaction, pp. 265–276. Springer (2014) 
6. Chowdhuri, D., Parel, N., Maity, A.: Virtual classroom for deaf people. In: 2012 
IEEE International Conference on Engineering Education: Innovative Practices 
and Future Trends (AICERA), pp. 1–3. IEEE (2012) 
7. Department of Sta? Development at Prince George’s Country Public Schools, in 
collaboration with the Division of Instruction: A Guid to Cooperative Learning. 
http://www.pgcps.pg.k12.md.us/~elc/learning1.html 
8. Dillenbourg, P.: Collaborative learning: cognitive and computational approaches. 
advances in learning and instruction series. In: ERIC (1999) 
9. Drigas, A.S., Vrettaros, J., Kouremenos, D.: An e-learning management sys-tem 
for the deaf people. In: Proceedings of the 4th WSEAS International Con-ference 
on Arti?cial Intelligence, Knowledge Engineering Data Bases, AIKED 
2005, pp. 28:1–28:5. World Scienti?c and Engineering Academy and Society 
(WSEAS), Stevens Point, Wisconsin, USA (2005). http://dl.acm.org/citation.cfm? 
id=1363642.1363670 
10. El-Al?, A., El-Gamal, A., El-Adly, R.: Real time arabic sign language to arabic 
text & sound translation system. Int. J. Eng. 3(5) (2014) 
11. Felder, R.M., Brent, R.: Cooperative learning. In: Active Learning: Models from 
the Analytical Sciences, ACS Symposium Series, vol. 970, pp. 34–D–53 (2007) 
12. Srinivas, H.: Knowledge Management (2014). http://www.gdrc.org/kmgmt/index. 
html 
13. Hashim, H., Tasir, Z., Mohamad, S.K.: E-learning environment for hearing 
impaired students. TOJET: Turkish Online J. Educ. Technol. 12(4) (2013) 
14. HCD: Higher council for the rights of persons with disabilities (2018). http://hcd. 
gov.jo/en/events 
15. HCD: Higher council for the rights of persons with disabilities reports and docu-ments 
(2018). http://www.hcd.gov.jo/ar/library-downloads 
16. Kagan, S.: The structural approach to cooperative learning. Educ. Leadership 
47(4), 12–15 (1989)
1028 A. A. Al-Jarrah 
17. Khwaldeh, S., Matar, N., Hunaiti, Z.: Interactivity in deaf classroom using cen-tralised 
e-learning system in Jordan. PGNet, ISBN, pp. 1–9025 (2007) 
18. Kyun, N.C., Tat, L.Y., Saripan, M.I., Abas, A.F.: Education for all: disabled 
friendly ?exi e-learning system. In: Proceedings of AEESEAP Regional Sympo-sium 
on Engineering Education, pp. 120–124 (2007) 
19. Lago, E.F., Acedo, S.O.: Factors a?ecting the participation of the deaf and hard 
of hearing in e-learning and their satisfaction: a quantitative study. Int. Rev. Res. 
Open Distrib. Learn. 18(7) (2017) 
20. Li, M., Lam, B.: Cooperative learning (2005) 
21. Pivotal, Inc.: Rabbitmq (2014). https://www.rabbitmq.com 
22. WFD: World federation of the deaf (2016). https://wfdeaf.org/our-work/
Game Framework to Improve English Language 
Learners’ Motivation and Performance 
Monther M. Elaish1,2(?) , Norjihan Abdul Ghani1(?) , Liyana Shuib1 , 
and Abdulmonem I. Shennat3 
1 
Department of Information Systems, Faculty of Computer Science and Information 
Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia 
{norjihan,liyanashuib}@um.edu.my 
2 
Department of Computer Science, Faculty of Information Technology, 
University of Benghazi, Benghazi 1803, Libya 
m_el81@yahoo.com 
3 
Department of Computer Science, Faculty of Engineering, 
University of Victoria, Victoria, Canada 
ashennat@uvic.ca 
Abstract. The dominance of English as the global language of entertainment, 
education and business creates a strong need to learn and teach it. Learning a 
second language is often di?cult and educators continue to seek innovative ways 
to improve language learning and increase learners’ motivation especially for 
second language learners. A number of technologies exist to assist in language 
learning, ranging from basic tools of distance, electronic, and mobile learning, to 
the use of games on mobile platforms to teach language skills. Developing games 
for educational purposes, however, is not a straightforward task even for profes- 
sional developers. As this technology is a recent trend, the available frameworks 
and guidelines to help developers as well as educators are still not adequate. In 
this paper, we explore an educational mobile game framework that is designed to 
improve students’ motivation and enhance English language learning. Through 
a review of existing mobile learning frameworks, plus re?ning and validation 
process with experts, the paper identi?es necessary components and their holistic 
structure. The key components include inputs from the domain of persuasive 
technology, Bloom’s taxonomy and educational content. At the core of the frame- 
work, it is a set of guidelines to develop mobile language learning mobile games. 
We present an evaluation of the proposed framework using expert input, which 
returned a positive and supportive feedback. In a subsequent phase, the proposed 
framework will be used to build a mobile game application to enhance language 
learning, and an experimental evaluation will be set on a target sample of Arabic-speaking 
primary school students. 
Keywords: Mobile learning · Mobile game framework 
English language learning 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1029–1040, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_77
1 Introduction 
English is spoken globally and is the main language used in the international academic 
and business communities, but learning the language is an added burden for non-native 
speakers [1]. Inducing enough motivation is both challenging and important [2] as it 
requires intricate balancing of the students’ skills and the learning task at hand [3]. 
Lack of motivation is a major problem in learning anything. The di?culties of 
learning a second language require extended motivation and persistence [4]. [5] de?nes 
motivation as “reason why people decide to do something, how long they are willing to 
sustain the activity, and how hard they are going to pursue it”. Digital technologies have 
been proposed to improve students’ motivation and increase their engagement in the 
learning process. For instance, Mobile learning (m-learning) is rising in popularity even 
among people studying the English Language [6]. Indeed, mobile learning has been seen 
as a progression from face-to-face learning, to distance and web-based learning, then 
mobile-assisted language learning (MALL) [7]. To e?ectively use mobile learning, 
many mobile games for teaching languages. This construct of mobile games for language 
learning (MGLL) is the context of this paper. 
Mobile educational games are now being used as an educational teaching strategy [8]; 
there has been link between mobile games learning and grade attainment. Students who use 
mobile games for learning attain higher test scores than those who were regularly involved 
in project-based lessons [9]. The use of digital game-based application as a motivational 
tool in English learning has been recommended [2, 10] as an intelligent adaptation of 
gaming can assist non-native speakers in expanding their vocabulary and other relevant 
aspects of learning a language [3]. The role of this adaptation is to keep the learner in a state 
of balance, neither bored nor overwhelmed by balancing the material offered with the abil- 
ities of the learner. This is done by embedding the learning within a more elaborate game, 
with an engaging narrative and a quest that keeps the learner interested in playing [3]. 
However, learning games should not be just a collection of game elements built together 
to form an application that is then labeled as a language learning tool because it tests 
vocabulary lists compared, for example, to testing flags of countries [43]. Games for 
learning languages, similar to the general case of other gamified applications, should be 
built using a systematic process and should refer to models and frameworks of building 
apps. A systematic processes of game development for language learning is lacking in the 
current landscape of MGLL research. One of the barriers to success has been that previous 
mobile learning design frameworks did not include theoretical instructional design guide- 
lines to support mobile learning [11, 12]. The instructional design strategies require a high 
level of persuasion and interactivity so this study used persuasive technology to overcome 
that barrier. Indeed, questions on the effectiveness of existing instructional design strategies 
remain which suggests that new approaches are needed [13, 14]. 
According to [15], persuasive technology is “any interactive computing system 
designed to change people’s attitudes or behavior”. Moreover, it can change their 
response depending on user inputs, requirements, and states. In this work, persuasive 
technology principles in addition to other educational components are combined to build 
a framework, with the main objective of providing a framework for developing language 
learning mobile games that can enhance learner’s motivation. 
1030 M. M. Elaish et al.
2 Background 
The hallmark of a successful learning game is for the users to enjoy learning. In [16], 
the author identi?es the main attributes of a game as: providing the player with an active 
experience, encouraging learning by doing of the active participant, providing a social 
medium that a?ords “the player with human-to-human interactions and emotional 
responses”, being “participatory by providing the player with customized rapid feed- 
back”. This makes learning engaging by promoting behavioral learning, while at the 
same time o?ering rewards and providing role models for the player. 
The author in [17] notes that language learning competence is enhanced by the 
learner’s awareness, sense of autonomy and authenticity. Awareness involves the under- 
standing of what the learner is doing, while autonomy is the ability of the learner to 
make personal decisions about the process. Authenticity refers to the relevance, mean- 
ingfulness and practical application of the learning materials and process. That is, the 
learner must be psychologically present while learning and must have the desire to put 
in the required e?ort towards learning a new language. 
The author in [18] highlighted the importance of “comprehensible input” in language 
teaching. This means that the instructor must create a learning environment that is not 
too di?cult for the student. The learner must also be active in the language learning, 
inculcate problem solving techniques within the language, and create personal ideas in 
self-made sentences that convey meaning in the language. In other words, critical 
engagement as opposed to passive or mechanical learning will not lead to competence 
in learning a language. Interaction in the language is needed for the learner to commu- 
nicate and pass meaning in the target language to others [19]. 
Persuasive technology research has been focusing on interactive and computational 
technologies, such as, the Internet services, computers, mobile devices and video games 
[21]. Nonetheless, this includes and is based on theories of human-computer interaction, 
results in rhetoric [22] and experimental psychology. According to [23], the introduction 
of these new technologies in the classroom introduces new opportunities for education. 
As indicated, if teachers encountering persuasive technology as a new tool in the class- 
room are to e?ectively assimilate it, they must consider how it relates to existing educa- 
tional paradigms and how it may o?er something innovative. The author in [20] consider 
persuasion in relation to socio-cultural or Vygotskian theories of learning. Whilst not 
identifying any actual points of con?ict, they do identify a di?erence in emphasis. They 
posit that socio-cultural approaches, with their emphasis on the social process of 
learning, do not give enough weight to the quality (i.e. credibility) of the texts, evidence 
or tools that are employed. 
They further argue that the emphasis on group social processes in learning in Vygot- 
skian theory underplays the role of individual emotional and cognitive preferences in 
determining the outcome of the learning process. Credibility is crucial when technology 
has the role of being the persuader and the development of credibility should be inte- 
grated into the overall pedagogical strategy. The credibility means the power to in?uence 
and technology does not have that property by nature. 
The author in [7] de?nes a mobile learning framework “in terms of the emerging 
procedures and processes that can be used to de?ne the mobile language learning”. 
Game Framework to Improve English Language Learners’ Motivation 1031
Furthermore, [7] provides a framework that describes critical requirements for creating 
time- and context-appropriate learning content. Having a framework that incorporates 
the appropriate learning theory and the capabilities of the technologies into the chosen 
instructional design strategies are essential to attaining desired outcomes for mobile 
learning initiatives [40]. While some design frameworks exist for use in implementing 
educational technology, there are concerns about their appropriateness for the design of 
mobile learning in all cases [13]. 
According to [24], there are three domains that teachers should understand and use 
to design lessons. These three domains are cognitive (thinking), a?ective (feeling or 
emotion) and psychomotor (kinaesthetic or physical). There are di?erent taxonomies 
(classi?cations) for each domain with psychomotor being the simplest, and the cognitive 
being the most complex in that hierarchy. 
The domain of learning in these terms was developed and described from 1956 to 
1972. There is an untrue notion of assigning all these domains to Bloom, who was the 
?rst author on the cognitive domain and whose name appears in the a?ective domain as 
well. The domains are: 
• Cognitive Domain [25] 
• A?ective Domain [26] 
• Psychomotor Domain [27] 
This study is concerned with the a?ective domain, which focuses on emotions and 
feelings and can be hierarchically classi?ed into ?ve parts [26]: 
• Receiving. The learner’s willingness to receive, sensitivity to the existence of stimuli-awareness, 
or selected attention. 
• Responding. The learners’ motivation to learn, “active attention to stimuli”, feelings 
of satisfaction, or willing responses. 
• Valuing. The learners’ beliefs and attitudes of worth and recognition, commitment, 
or preference. 
• Organization. The learners’ “internalization of values and beliefs involving values 
conceptualization and value organization system”. As beliefs or values become inter- 
nalized, the learners organize them in order of priority. 
• Characterization. The learner’s “highest internalization and behavior that re?ects a 
generalized set of values and a” philosophy or characterization of life. So learners 
are capable of acting and practicing on their beliefs or values. 
3 Research Approach and Methods 
To propose a mobile game framework that can motivate and enhance English learning, 
we ?rst reviewed three main aspects of existing m-learning frameworks: gaming, moti- 
vation, as well as the learner and learning environment. We suggested to include appli- 
cation components that support each of these aspects. While each application case needs 
a di?erent framework, this study: 
1032 M. M. Elaish et al.
• Reviewed the current mobile frameworks to ?nd out the main attributes of the frame- 
works that were used to develop a mobile game for language learning. 
• Developed a framework based on persuasive guidelines, a?ective learning domain 
taxonomy and the learners/learning environment. This was based on the assumption 
that mobile technologies can broaden the landscape of learning experiences. Models 
and frameworks are important in structuring the design, development, implementa- 
tion, and evaluation of these mobile learning experiences [28] and the frameworks 
must take into account both the learner and the learning environment [29]. 
• The framework was certi?ed by experts. The experts evaluated each guideline’s 
principle using an online questionnaire to ascertain the validity of each principle. 
3.1 Framework Development 
In the ?rst design cycle, the framework was developed from the literature. First, two 
studies provided the background of the mobile learning framework: [28, 30]. The authors 
in [30] used the snowballing technique to select 17 mobile educational frameworks for 
analysis that identi?ed their main components as the “learner, device, context, time, 
content, social interactions, usability, pedagogy, and surrounding culture”. In [28], the 
author reviewed 17 frameworks that they classi?ed into ?ve categories: “pedagogies 
and learning environment design, platform/system design, technology acceptance, eval- 
uation, and psychological construct”. This [28] analysis is applicable to this study 
because of its arrangement, classi?cation, relevance and purpose. Subsequently, for this 
study, three criteria were identi?ed as critical for an m-learning framework for English 
language learning: 
• Include only frameworks where language is one of or the main coverage area. Four 
frameworks ?t this criterion [31, 34]. 
• The second criterion was based on the framework’s focus on mobile games to 
increase motivation for language learning. The assumption is that language learning 
requires a di?erent mobile framework [13, 29]. Accordingly, the frameworks used 
in designing mobile learning must take into account both the learner and the learning 
environment [29]. Table 1 shows whether each framework has the above components 
or not based on the author’s review. 
Table 1. Mapping of framework components to the most relevant frameworks in literature 
Reference Mobile L. Learning Game focus Motivation Learner and learning envi- 
ronment 
[32] Yes Yes Yes Yes No 
[31] Yes Yes No Yes Yes 
[33] Yes Yes No Yes Yes 
[34] Yes Yes Partly No Yes 
• Using Crompton’s [35] characterization, each framework was analyzed on how it 
deals with the aspects of “context, learner, device, social interactions, and 
Game Framework to Improve English Language Learners’ Motivation 1033
pedagogical approaches”. However, not all frameworks showed this, nor were they 
all clear on how they applied or achieved each of these aspects. Some of them focused 
on location-based learning, which is not our aim. 
The author in [32] showed the strongest resemblance to the aspects we identi?ed as 
essential for a mobile learning framework for learning English by leveraging the power 
of mobile games to motivate learners. However, this framework does not include the 
Table 2. Comparison of persuasive technology and game patterns 
Persuasive technology Game patterns 
[15] principles [21] principle [36] 
No. 43 28 73 
Con?ict among the 
mobile game patterns 
Less Less High 
Clear description Less Less High 
Motivation focus Yes Yes Some of them 
Table 3. Guideline principles based on persuasive technology [37] 
Guideline Description on the principle 
Reduction Makes the system simpler 
Tunneling A method to guide the user through a set of predetermined sequence 
actions to encourage or dampen behavior 
Self-monitoring Allows users to track their performance and status 
Tailoring Design depends on needs, interests, personality, the use of context or 
any aspect belong to users group 
Convenience Easy to access 
Mobile simplicity Mobile applications that are uncomplicated to use will have a greater 
potential to persuade 
Mobile loyalty Serves its own user needs and wishes 
Information quality Delivers current, pertinent, and well-arranged information 
Kairos or JiTT (Just In 
Time Teaching) 
Gives suggestion at the right moment 
Social facilitation Shows user others performing the same behavior 
Social comparison Allows comparison 
Social learning Allows users to observe others performance 
Competition Technology can motivate users to adopt a target attitude or behavior by 
leveraging human beings’ natural drive to compete 
Cooperation Technology can motivate the user to adopt a target attitude or behavior 
by leveraging human beings’ natural drive to cooperate 
Recognition By o?ering public awareness (individual or group), computing 
technology can raise the likelihood that a person or group will adopt a 
behavior or attitude 
Conditional rewarding Rewards depend on target behavior 
1034 M. M. Elaish et al.
learner and learner environment, which is also important. Therefore, we modi?ed the 
framework to include the learner and educator. Moreover, [32]’s framework was devel- 
oped to select the game patterns that were used to solve the motivation problem. The 
framework components are based on the game design patterns for mobile games estab- 
lished by [36], and the Bloom’s taxonomy of learning outcomes [25]. 
To deal with the pattern limitation described by [32] additional issues of persuasive 
technology and game patterns were included. For the persuasive technology, there are 
two main studies [15, 21]. As illustrated in Table 2, the reason to replace game patterns 
with persuasive principles guideline is that of fewer principles compared to patterns, 
con?ict among patterns and clearer classi?cation of principles. 
A persuasive guideline was developed previously by [37] based on three aspects: 
mobile, game, and language learning; and a set of persuasive principles shown in Table 3. 
4 The Proposed Framework 
Research on the introduction of Information and communications technology (ICT) in 
education [38, 39] has shown that for its effectiveness, there needs to be understanding of 
both the strengths and weakness of the technology, while at the same time being cogni- 
zance of the pedagogical practices required in implementing technology-enhanced learning. 
To do this, a framework is proposed for developing educational mobile game application. 
4.1 Initial Framework 
Figure 1 depicts a framework that integrates the learning ideas from persuasive guide- 
lines and learner and learning environments into the application requirements for a 
Instructional 
Educator Learner 
Educational 
Content 
Mobile Game 
Application 
Persuasive 
Technology 
Guideline 
Bloom’s Taxonomy 
of Learning (Affective 
Domain) 
Fig. 1. Initial framework. 
Game Framework to Improve English Language Learners’ Motivation 1035
mobile game. As stated above, this includes integrating a set of pedagogical approaches 
to support learning English. Thus, the framework extends learning in a mobile environ- 
ment by using pedagogical approaches that also include persuasive principles. 
4.2 Evaluated the Framework 
After the framework was developed, it was iteratively evaluated by experts. The selected 
evaluators were experts in mobile learning, game usability, game design and develop- 
ment, human-computer interaction, software development on Web and mobile plat- 
forms, computer and communications engineering, mobile security, graphic design, 
visual communication, multimedia studies, instructional design. Thirteen experts were 
identi?ed through their research and publications in Google scholar. Of the thirteen, ?ve 
of them volunteered to participate in the review. The review iteratively asked the experts 
to review and comment on the following eight items: 
• Correctness of the framework (5/5) 
• Suitability of the framework for a study into mobile games in language education 
(4/5) 
• Suitability for the framework in capturing elements of motivation in primary school 
language learning (3/5) 
• Suitability of the framework in the actual design of the gaming application (3/5) 
• Ease of use of the framework for primary school learners (4/5) 
• An opinion on the use of the application in helping learners to improve their vocabu- 
lary (4/5) 
• Possible revisions, enhancement on the framework (2 comments) 
• Missing elements in the framework (no comments) 
In the ?nal iteration, an agreement on each of the item was sought with the numbers 
in brackets above showing the number agreeing with each of the brackets. Two experts 
provided queries about the educational theory because they believe this kind of frame- 
work should has educational theory included. For language learning, socio-culture 
theory has been added to the framework (see Fig. 2). This is to reinforce the fact that 
language is best learnt in a social setup. Therefore, the tenets from the socio-cultural 
theory are used the persuasive technology as a social tool for language learning. In this 
framework, therefore, persuasive re?ects the socio-culture theory which views of 
learning, development and motivation as social in nature [41], and emphasizes the inter- 
dependence of social and cultural interactions in the construction of knowledge [42]. 
Most of the experts have a favourably reviewed the framework and they strongly believe 
it could be used to improve primary students’ motivation and performance in English 
language learning. 
1036 M. M. Elaish et al.
Instructional 
Educator Learner 
Educational 
Content 
Mobile Game 
Application 
Persuasive 
Technology 
Guideline 
Bloom’s Taxonomy 
of Learning (Affective 
Domain) 
Socio-Culture 
Theory 
Fig. 2. Proposed framework that integrates learning ideas, learner and learning environments to 
form requirements for a mobile game. 
5 Conclusions 
Designing mobile games for language learning is not an easy task, especially for young 
students who do not fully understand the purpose of education or their educational 
materials. However, persuasive technology can provide a good way to design an inter- 
face that can guide learners through application steps. This technology can o?er theo- 
retical grounding to support a mobile learning framework that is needed for proper 
Game Framework to Improve English Language Learners’ Motivation 1037
mobile application design. Due to the importance of persuasive technology it should be 
considered for each case. In this study, guidelines were developed based on three factors 
(mobile, game, language learning) to optimize interface design that covers all the tools 
used in the application. Expert opinion and feedback are important elements to ensure 
that the design of the framework follows the guidelines and can be applied and used to 
design applications. In this case, experts evaluated the framework and gave positive 
comments and feedback. 
The ultimate goal of the proposed framework in this paper is to provide a guideline 
and systematic reference for professional developers to build mobile game applications 
that can motivate students. Consequently, the next step in our research is to develop an 
actual game application based on the proposed framework. The intended English 
Vocabulary Game EVG prototype will be speci?cally developed to support beginning 
students with individual learning practices based on the studied course goals. The appli- 
cation can be seen as a learning support resource that complements other existing tools 
for students. The main purpose of the prototype is to showcase the implementation of 
the framework proposed in this paper, including the principles of persuasive technology, 
and to allow for the experimental evaluation using a sample of primary school students. 
References 
1. Cheng, C.-M.: Re?ections of college English majors’ cultural perceptions on learning English 
in Taiwan. Engl. Lang. Teach. 6(1), 79–91 (2013) 
2. Ma, Z.-H., Hwang, W.-Y., Chen, S.-Y., Ding, W.-J.: Digital game-based after-school-assisted 
learning system in English. In: 2012 International Symposium on Intelligent Signal 
Processing and Communications Systems, pp. 130–135. IEEE (2012) 
3. Sandberg, J., Maris, M., Hoogendoorn, P.: The added value of a gaming context and intelligent 
adaptation for a mobile learning application for vocabulary learning. Comput. Educ. 76, 119– 
130 (2014) 
4. Kondo, M., Ishikawa, Y., Smith, C., Sakamoto, K., Shimomura, H., Wada, N.: Mobile assisted 
language learning in university EFL courses in Japan: developing attitudes and skills for self-regulated 
learning. ReCALL 24(2), 169–187 (2012) 
5. Dornyei, Z., Ushioda, E.: Teaching and Researching Motivation. Longman, London (2001) 
6. Elaish, M.-M., Shuib, L., Ghani, N.-A., Yadegaridehkordi, E., Alaa, M.: Mobile learning for 
English language acquisition: taxonomy, challenges, and recommendations. IEEE Access 5, 
19033–19047 (2017) 
7. Kukulska-Hulme, A.: Language learning de?ned by time and place: a framework for next 
generation designs. In: Díaz-Vera, E.-J. (eds.) Left to My Own Devices: Learner Autonomy 
and Mobile Assisted Language Learning, Innovation and Leadership in English Language 
Teaching, 6th edn. Emerald Group Publishing Limited, Bingley (2012) 
8. Chen, H., Lin, K., Wang, Y.: The comparison of solitary and collaborative modes of game-based 
learning on students’ science learning and motivation. J. Educ. Technol. Soc. 18(2), 
237–248 (2015) 
9. Huizenga, J., Admiraal, W., Akkerman, S., Dam, G.T.: Mobile game-based learning in 
secondary education: engagement, motivation and learning in a mobile city game. J. Comput. 
Assist. Learn. 25(4), 332–344 (2009) 
10. Elaish, M.-M., Shuib, L., Ghani, N.A., Yadegaridehkordi, E.: Mobile English Language 
Learning (MELL): a literature review. Educ. Rev., 1–20 (2017) 
1038 M. M. Elaish et al.
11. Herrington, A., Herrington, J.: Authentic mobile learning in higher education. In: Je?rey, P. 
(ed.) Proceedings of the Australian Association for Research in Education (AARE) 
International Educational Research Conference, pp. 1–9. AARE, Australia (2007) 
12. Park, Y.: A Pedagogical framework for mobile learning: categorizing educational 
applications of mobile technologies into four types. Int. Rev. Res. Open Distrib. Learn. 12(2), 
78–102 (2011) 
13. Berking, P., Archibald, T., Haag, J., Birtwhistle, M.: Mobile learning: not just another delivery 
method, interservice/industry training, simulation. In: Interservice/Industry Training, 
Simulation, and Education Conference (I/ITSEC), pp. 1–10 (2012) 
14. Koszalka, T.-A., Ntloedibe-Kuswani, G.-S.: Literature on the safe and disruptive learning 
potential of mobile technologies. Dist. Educ. 31, 139–157 (2010) 
15. Fogg, B.-J.: Persuasive Technology. Elsevier, Amsterdam (2003) 
16. Winn, B.-M.: The design, play, and experience framework. In: Handbook of Research on 
E?ective Electronic Gaming in Education, pp. 1010–1024. IGI Global, Hershey (2008) 
17. Van Lier, L.: Interaction in the Language Curriculum: Awareness, Autonomy and 
Authenticity. Longman, London (1996) 
18. Krashen, S.: We acquire vocabulary and spelling by reading: additional evidence for the input 
hypothesis. Mod. Lang. J. 73(4), 440–464 (1989) 
19. Cummins, J., Swain, M.: Linguistic interdependence: a central principle of bilingual 
education. In: Bilingualism in Education: Aspects of Theory, Research and Practice, pp. 80– 
95 (1986) 
20. Alexander, P.-A., Fives, H., Buehl, M.-M., Mulhern, J.: Teaching as persuasion. Teach. 
Teach. Educ. 18(7), 795–813 (2002) 
21. Oinas-Kukkonen, H., Harjumaa, M.: Persuasive systems design: key issues, process model, 
and system features. commun. Commun. Assoc. Inf. Syst. 24(1), 28 (2009) 
22. Bogost, I.: Persuasive Games: The Expressive Power of Videogames. MIT Press, Cambridge 
(2007) 
23. Mintz, J., Aagaard, M.: The application of persuasive technology to educational settings: 
some theoretical from the HANDS project. In: Proceedings of Poster Papers for the Fifth 
International Conference on Persuasive Technology, Persuasive 2010, pp. 101–104. Oulu 
University Press (2010) 
24. Wilson, L.-O.: Making instructional decisions. https://thesecondprinciple.com/teaching-essentials/
instructional-decisions/. Accessed 28 Mar 2018 
25. Bloom, D.: Taxonomy of educational objectives. In: Handbook 1: Cognitive Domain. David 
McKay, New York (1956) 
26. Krathwohl, D.-R., Bloom, B.-S., Masia, B.-B.: Taxonomy of Educational Objectives, Book 
II. A?ective Domain. David Mackay, New York (1964) 
27. Harrow, A.: A taxonomy of the psychomotor domain: a guide for developing behavioral 
objectives. Addison-Wesley Longman Publishing Co. Inc., New York (1972) 
28. Hsu, Y., Ching, Y.-H.: A review of models and frameworks for designing mobile learning 
experiences and environments. Can. J. Learn. Technol. 41, 1–22 (2015) 
29. Teall, E., Wang, M., Callaghan, V.: A synthesis of current mobile learning guidelines and 
frameworks. In: E-Learn: World Conference on E-Learning in Corporate, Government, 
Healthcare, and Higher Education, Association for the Advancement of Computing in 
Education (AACE), pp. 443–451 (2011) 
30. Rikala, J.: Designing a mobile learning framework for a formal educational context. Jyväskylä 
Stud. Comput. (2015) 
Game Framework to Improve English Language Learners’ Motivation 1039
31. Wei, Y., So, H.-J.: A three-level evaluation framework for a systematic review of contextual 
mobile learning. In: 11th International Conference on Mobile and Contextual Learning, pp. 
164–171. Helsinki (2012) 
32. Schmitz, B., Klemke, R., Specht, M.: E?ects of mobile gaming patterns on learning outcomes: 
a literature review. Int. J. Technol. Enhanced Learn. 4(5–6), 345–358 (2012) 
33. Abdullah, M.-R.-T.-L., Hussin, Z., Asra, B., Zakaria, A.-R.: MLearning sca?olding model 
for undergraduate English language learning: bridging formal and informal learning. TOJET: 
Turk. Online J. Educ. Technol. 12(2), 217–233 (2013) 
34. Scanlon, E., Gaved, M., Jones, A., Kukulska-Hulme, A., Paletta, L., Dunwell, I.: 
Representations of an incidental learning framework to support mobile learning. In: 
Proceedings of the 10th International Conference on Mobile Learning, pp. 238–242 (2014) 
35. Crompton, H.: A historical overview of mobile learning: toward learner-centered education. 
In: Handbook of Mobile Learning, pp. 3–14 (2013) 
36. Davidsson, O., Peitz, J., Björk, S.: Game Design Patterns for Mobile Games. Project Report 
to Nokia Research Center, Finland (2004) 
37. Elaish, M.-M., Shuib, L., Ghani, N.-A.: Mobile game applications (MGAs) for english 
language learning: a guideline for development. In: 94th IASTEM International Conference, 
pp. 11–16. Kuala Lumpur (2017) 
38. Salmon, W.-C.: Scienti?c explanation: causation and uni?cation. Critica: Revista 
Hispanoamericana de Filoso?a 22, 3–23 (1990) 
39. Motiwalla, L.F.: Mobile learning: a framework and evaluation. Comput. Educ. 49(3), 581– 
596 (2007) 
40. Zarei, A., Mohd-Yusof, K., Daud, M.-F.: Mobile multimedia instruction for engineering 
education: Why and how. ASEAN J. Eng. Educ. 2, 21–29 (2015) 
41. McInerney, D.-M., Walker, R.-A., Liem, G.: Sociocultural Theories of Learning and 
Motivation: Looking Back, Looking Forward. Information Age Publishing Inc., Charlotte, 
North Carolina (2011) 
42. John-Steiner, V., Mahn, H.: Sociocultural approaches to learning and development: a 
Vygotskian framework. Educ. Psychol. 31(3–4), 191–206 (1996) 
43. Elaish, M.-M., Ghani, N.-A., Shuib, L., Al-Haiqi, M.-A.: Mobile games for language learning. 
In: Paiva, S. (ed.) Mobile Applications and Solutions for Social Inclusion, pp. 137–156. IGI 
Global (2018) 
1040 M. M. Elaish et al.
Insights into Design of Educational Games: 
Comparative Analysis of Design Models 
Rabail Tahir(&) and Alf Inge Wang 
Norwegian University of Science and Technology, Trondheim, Norway 
rabail.tahir@ntnu.no 
Abstract. The study reports on an ongoing research that intends to identify and 
validate the core dimensions for Game-Based-Learning (GBL) and further 
explore the shift in dimensional focus between different phases of educational 
game development life cycle: pre-production (design), production (develop-ment) 
and post-production (testing and maintenance). Hence, this paper presents 
the initial work focusing on design phase by presenting a comparative analysis 
of educational game design models using GBL attributes, validity and frame-work 
attributes as analytical lens. The main objective is to analyze the funda-mental 
GBL attributes in existing design models to identify the common 
attributes which demonstrate their importance for design phase and highlight 
any need for further research in terms of attribute validation and framework 
improvement. This study also highlights the strengths and weakness of existing 
design frameworks. The results of analysis underline learning/pedagogical 
aspects and game factors as the most essential attributes for design phase of 
educational games. Comparative analysis also guides researchers/practitioners to 
better understand GBL through various properties of different existing design 
models and highlights the open problems such as lack of tool support, empirical 
validation, independent evaluations, adaptability and absence of concrete 
guidance for application to make more informed judgments. 
Keywords: Educational games
.o
Game-Based learning
.o
Serious games 
Design models
.o
Frameworks
.o
Comparative analysis
.o
Design attributes 
1 Introduction 
Over the past decade, educational games or game-based learning systems have greatly 
impacted the learning industry. However, it has been a constant challenge for educa-tional 
game designers to understand the different aspects embedded in game-based 
learning [1]. Lately, several researchers have proposed design frameworks/models/ 
guidelines to guide educational game design [2–16]. According to Neil [17] usually all 
proposed design models tend to communicate some core foundational elements, yet 
they differ in their approach and results. As there is a lack of dialogue between 
researcher and practitioners and also among researchers themselves. Therefore, also at 
completely theoretical level, there is a lack of work providing comprehensive com-parative 
analysis in the ?eld [17]. To the best of our knowledge, we found only two 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1041–1061, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_78
such attempts of comparison studies for learning game design frameworks. Dos Santos 
et al. [18] presented a comparison of 5 digital learning game design methodological 
frameworks and highlighted their differences and similarities to identify selection cri-teria 
for guiding framework choice and promote methodological frameworks as a way 
to encourage principled educational games design. However, the framework selection 
is not explicitly stated. Likewise, Ahmad et al. [19] presented a survey of different 
educational design frameworks; against criteria such as well-designed games, effective 
video games, four learning theories and key elements of a games and analyzed them 
from software engineering perspective for the development of effective educational 
games. However, the keywords are not speci?cally focused on educational games. 
Malliarakis et al. [20], however, did not present a comparative analysis but studied 
existing frameworks for educational game design to document the features supported 
by current educational games to teach computer programming in order to establish a 
framework for the design of their computer programming speci?c educational game. 
Often the underlying purpose of comparison entails valuing one model over 
another. However, this is not the sole focus of this study. Rather, the approach here is to 
analyze the existing design models/frameworks against core GBL dimensions to pin-point 
elements speci?cally focused for the design phase based on similarities in ana-lyzed 
frameworks. The GBL dimensions selected as analytical lens comes from our 
previous research results [33]. Although all core dimensions are considered important 
for an effective educational game product but dividing them in different phases might 
help education game designer and developers to emphasize the focus in that phase and 
ease the process. Further, the design frameworks are also compared in terms of vali-dation 
of used dimensions and exploring framework attributes to highlight strengths 
and weaknesses which would aid researchers and designer in better understanding the 
issues in educational game design. The objectives of this study are the following: 
1. Exploring game-based learning attributes used in existing design models. 
2. Validation of game-based learning attributes by existing models and frameworks: 
Support for being theoretically grounded and empirically sound. 
3. Comparison of existing GBL frameworks using analytical lens to identify open 
issues and highlight their strengths and weaknesses. 
The paper is organized into following sections. Section 2 describes background by 
presenting an overview of educational game design frameworks/models, Sect. 3 
describes the method, Sect. 4 illustrates the comparative analysis, Sect. 5 presents 
discussion and ?nally, Sect. 6 concludes the study with conclusion and future work. 
2 An Overview of Educational Game Design 
Frameworks/Models 
Our previous research study examined the state of the art in game-based learning by 
conducting a systematic literature review. The work reported in [21] highlighted the 
existing design focused approaches for educational games and these frameworks/ 
models were selected for the comparative analysis described in this paper. In this 
1042 R. Tahir and A. I. Wang
section, the existing educational game design models/frameworks are presented, and 
their objectives are briefly described. 
2.1 Level Up 
The goal of Level Up [6], is to build new modes to design and evaluate the future 
game-based learning systems. The author hypothesized that the framework will 
increase the production speed of educational games, increase the quality and offer 
scienti?c evaluation of educational content of the games. According to the author Level 
Up framework will make use of a collection of empirical experiments as well as log-data 
driven analyses using empirical learning curves for understanding learning in 
educational games. The aim is to model learning of students and identify gaps to 
improve game development by using educational data mining on game-log data of 
students. The learning models could be dual fold: assessing the quality of learning in 
educational game and identifying the exact spots for applying in-game feedback (e.g. 
hints on more dif?cult problems). The author makes use of game-log data for evalu-ating 
learning in an educational game. The evaluations and logging system together are 
considered to provide foundation for developing design principles for an effective 
educational game. 
2.2 Experiential Gaming Model 
The experiential gaming model [8] is developed based on the idea of integrating 
experiential learning theory, flow theory and game design. Experiential gaming model 
emphasizes the importance of clear goals, providing immediate feedback, and matching 
challenges to skill level of players. The model comprises of an experience loop, 
ideation loop, and a challenge depository. The model uses the operational principle of 
human blood-vascular system as metaphor. The heart of the model is formed by 
challenges based on educational objectives. The flow theory is applied and factors 
contributing to flow experience are discussed in the model to enhance positive user 
experience and maximize educational game impact. 
2.3 Framework for the Analysis and Design of Educational Games 
This framework for design of educational game [2] is developed based on existing 
components including a method for specifying the educational objectives, principles for 
instructional design supported by empirical research in learning sciences and a 
framework for linking game dynamics, mechanics and aesthetics. The framework 
directs the levels which are essentials for an educational game to be effective. The 
framework discusses the three components: Learning objectives, MDA and Instruc-tional 
principles highlighting the support they can provide to game designer by the 
analytical angle. The author highlights that success of educational game is more 
prospective when learning objectives of educational game are clearly established early 
in development process and if designers carefully think about linking the desired game 
aesthetic in game mechanics, via proper game dynamics observing the proven 
instructional design principles. 
Insights into Design of Educational Games 1043
2.4 RETAIN Model 
Zhang et al. [16] presented the RETAIN model consisting of six elements (relevance, 
embedding, transfer, adaptation, immersion and naturalization). The model is con-structed 
on instructional design principles and describes the notorious concepts 
between instructional design and game, providing a common framework for educators 
and game designers by comprehending the effective integration of game and learning 
content to even them out. 
2.5 Adaptive Digital Game-Based Learning Framework 
The author [13] has identi?ed essential components and features of best practice to be 
considered for the design of games-based learning environments based on existing 
models and frameworks. The author discusses four frameworks/models in this paper: 
The Design Framework for Edutainment Environment, Adopted Interaction Cycle for 
Games, The Engaging Multimedia Design Model for Children and Game Object 
Model. Based on analysis the developed framework focuses on the learners and the 
game design. The framework also highlights some important features such as chal-lenge, 
goals, story and objectives not included as part of the framework. 
2.6 A Theoretical Framework for Serious Game Design 
Rooney [10] investigated a triadic theoretical framework consisting of the elements of 
pedagogy, play and ?delity for the design of serious games. The author points out that 
the inherent inconsistencies between pedagogy, game design and ?delity make it dif-?cult 
to balance these elements during serious game design process and integrating 
them in one coherent framework. Another challenge is the multidisciplinary nature of 
serious game that require collaboration between members from different disciplines 
bringing in the conflicting interests, priorities and from diverse backgrounds can 
complicating the process of “balancing”. 
2.7 The “I’s” Have It (A Framework for Serious Educational 
Game Design) 
The framework “I’s have it” for the design of serious educational games is a nested 
model of six elements: identity, immersion, interaction, increased complexity, informed 
teaching and instructional [4]. The elements of the framework are derived from studies 
on design and development of games from Grade 5 to graduate level. The elements are 
grounded in theory and research within education, instructional technology, psychol-ogy, 
and learning sciences. According to the framework educational games contain 
these six elements that come into view in the order of magnitude staring from the 
element identity and ending at instructional. According to the author the backbone of 
his work is based on the research in constructivist viewpoint which shows that people 
learn based on discovering prior schema and eventually building the new knowledge by 
connecting their new experience with prior ones. 
1044 R. Tahir and A. I. Wang
2.8 e-VITA Framework for SGs 
The framework for serious games developed as a part of e-VITA project [9] focuses on 
three key dimensions including technical veri?cation, user experience and pedagogical 
aspects (learning outcome). The project highlights serious games as a game, an IT 
product, and a learning instrument. It argues that with respect to development and 
evaluation, an educational game should have three critical dimensions to be effective 
(1) it should be easy-to-use and technically sound; (2) it should be engaging and fun 
game; and (3) it should be an effective learning instrument providing desired learning 
outcomes. To improve motivation and learning, all the three dimensions should be 
targeted, the failure to meet any one dimension could compromise the effectiveness of 
serious games. 
2.9 Educational Games (EG) Design Framework 
The focus of Ibrahim et al. [7] was to develop an educational game design framework 
for higher education. This author compared few available frameworks and recommend 
the required criteria based on his analysis both from pedagogy and game design 
viewpoint. The idea behind this framework is to combine three factors that include 
pedagogy, game design and learning content modelling into the educational game 
design. The focus of game design is on multimodality and usability. As usability 
studies in educational games are not much focused by researchers. Similarly, the focus 
of pedagogical factor is learning outcomes and motivation theory. The factors of fun, 
problem solving, and syllabus matching are also highlighted. 
2.10 Game Factors and Game-Based Learning Design Model 
Shi et al. [11] underlined the fact that prior models are designed based on speci?c game 
genres making them dif?cult to use when target game genre is different from default 
game genres applied in research. Therefore, the author presents macro level design 
concepts comprising of 11 key factors for game-design. The factors include game 
goals, game fantasy, game mechanism, game value, narrative, interaction, challenges, 
freedom, sociality, sensation, and mystery. The author veri?es the usability of the 
model and performance of identi?ed factors for designing educational games by ana-lyzing 
two applications. 
3 Method 
The methodology used in this paper is the comparative analysis of educational game 
design models/frameworks using appropriate analytical tools. The Quasi-formal 
comparison technique proposed by [22] and used by many researches [23–25] for 
comparative reviews is employed in this study. 
The comparison of existing frameworks and models with one another is useful to 
get an insight into a speci?c area and identify the gaps for future research. Although, it 
is a very dif?cult task, but the result is often considered to have some sort of researcher 
bias as it is based upon the subjective judgment of the researcher. Two alternative 
Insights into Design of Educational Games 1045
approaches have been proposed for comparative analysis, informal and quasi-formal 
comparison. However, informal comparison lacks a systematic framework to direct the 
analysis and therefore is more likely to have a subjective bias. Quasi-formal compar-ison 
on the other hand attempts to subdue the subjective limitations by presenting a 
strategy and creating a baseline for comparison in the form of an analytical tool. Quasi-formal 
comparisons can be conducted using different techniques. One technique is to 
select a set of critical perspectives or attributes and then compare the objects against 
them and this is considered closer to a traditional scienti?c method [22]. This approach 
is adopted for conducting the quasi-formal comparison in this study. For this purpose, 
appropriate analytic tools are needed to make analysis and comparison. Although many 
researchers have proposed and used analytical tools for comparative analysis [26–29] 
but not all ?t for the purpose and speci?c area of this research. The analytical lenses 
seen as appropriate for the research objective of this study are classi?ed as: 
GBL/educational game attributes; validity and framework attributes. The GBL attri-butes 
were selected based on our earlier research study which categorized game-based 
learning into six fundamental dimensions using directed content analysis [33] of GBL 
literature selected through a systematic literature review [21]. The analytical lenses of 
validity and framework attributes are taken from [23, 26, 27]. These analytical lenses 
are described along with the references in Table 1. The research study outlines three 
research questions, which are as follows: 
RQ1. Which GBL attributes are essential for design phase of educational game 
development life cycle. (comparison of attributes covered in each 
model/framework). 
Table 1. Analytical lens for comparative analysis of existing educational game design 
models/frameworks 
Analytical lens Description Reference 
GBL Attributes How many and which GBL attributes are covered by the educational game 
design model/framework? 
[21, 33] 
Learning/pedagogical Does the model/framework consider learning/pedagogical attribute, or any 
elements related to it? 
Game factor Does the model/framework consider game factor attribute, or any elements 
related to it? 
Affective Reactions Does the model/framework consider affective reaction attribute, or any 
elements related to it? 
Usability Does the model/framework consider usability attribute, or any elements related 
to it? 
User Does the model/framework consider user attribute, or any elements related to 
it? 
Environment Does the model/framework consider environment attribute, or any elements 
related to it? 
Validity Does the model/framework have support for its claims? [18, 23, 
Theoretical evidence 
26] 
(Development basis) 
Is the model/framework grounded in appropriate theory? (author provide 
development basis for the model/framework) 
(continued) 
1046 R. Tahir and A. I. Wang
RQ2. To what extent are the attributes being used in existing models validated? Are 
they theoretically grounded? Is empirical evidence available? 
RQ3. What type of characteristics are provided by existing design models to 
operationalize and use them and their strengths and weaknesses? 
4 Comparative Analysis 
The frameworks described above aimed at establishing guidelines and patterns for 
designing effective educational games. A comparison of these models, highlights not 
only the fundamental common characteristics to be considered during GBL design 
phase but also highlights the distinct aspects and approaches of each framework plus 
bringing forward the open issues that still needs to be addressed in GBL design 
research. In this section, 15 existing educational design models/guidelines (including 
10 models/frameworks and 5 design guidelines/principals) are compared and analyzed 
using the three categories of analytical lenses (GBL attributes, validity and framework 
attributes) described in Table 1. 
4.1 Key GBL Attributes 
Among the most signi?cant comparison features is the number of key attributes a 
model/framework deal with [26]. Six fundamental GBL elements were selected for 
comparative analysis of design frameworks (see Table 2). These include learning/ 
pedagogy, game factors, affective reactions, usability, user and environment. The 
reason for selecting speci?cally these six attributes as analytical lens is because they are 
identi?ed as core dimensions of GBL in our earlier research study [33]. Therefore, the 
aim here is to identify if any of these six attributes should be more focused or par-ticularly 
essential for the design phase of effective educational games. 
Table 1. (continued) 
Analytical lens Description Reference 
Empirical evidence 
(Validation/application) 
Does the model/framework have empirical support for its claims? (details of 
application/validation of framework/model: game name, sample size, validated 
elements) 
Framework attributes What type of attributes are provided by the model/framework? [18, 23, 
Tool/instrument Support Does the model/framework offer tool/instrument support for its artefacts? 
27, 28] 
Assessment and 
stakeholders 
What types of assessment approaches are used for the model/framework? 
Which groups of stakeholders are required to participate in assessment? 
Applicable Stage What is the most appropriate educational game development lifecycle phase(s) 
to apply the model/framework? 
Application domain In which application domain(s) the model is mostly applied? 
Guidance for application 
(abstract principles vs 
concrete guidance) 
Does the model/framework rely only on abstract principles or it provides 
concrete guidance? (offer guidelines on how to practically use it for educational 
game design) 
Target/adaptability Is the model/framework ?t for all educational games (universal/generic) or is it 
situation appropriate (speci?c)? Does it offer adaptability in actual use? 
Strength/weakness What are the strengths and weaknesses of the model/framework? 
Insights into Design of Educational Games 1047
Table 2. Comparative analysis of educational game design models/frameworks based on key GBL attributes 
Design-focus frameworks Learning/pedagogy Game 
factor 
Affective 
reactions 
Usability Users Environment Total 
Game-Based Learning 
Guidelines [3] 
X (learning objectives) X (game 
req.) 
X (User 
Interface) 
X (child 
req.) 
4 
Level Up [6] X (learning) 1 
Experiential gaming 
model [8] 
X (experiential learning) X (Game 
design) 
X (flow) 3 
Usability guidelines for 
mobile educational games 
[14] 
X (Usability) X (Context) 2 
Framework for analysis 
and design of educational 
games [2] 
X (learning objectives, 
Instructional design) 
X (MDA 2 
RETAIN Model [16] X (Relevance, Embedding, 
Transfer, Adaptation, 
Naturalization 
X 
(immersion) 
2 
Adaptive Digital Game-
Based Learning 
Framework [13] 
X (Game 
design) 
X (Learner) 2 
A Theoretical Framework 
for Serious Game Design 
[10] 
X (pedagogy) X (?delity) X (play) 3 
“I’s” have it [4] X (instructional) X 
(identity) 
X 
(immersion) 
3 
(continued) 
1048 R. Tahir and A. I. Wang
Table 2. (continued) 
Design-focus frameworks Learning/pedagogy Game 
factor 
Affective 
reactions 
Usability Users Environment Total 
User Experience for 
Mobile Game-Based 
Learning [12] 
X (learning content) X (game 
play) 
X (usability) X (mobility) 4 
EGameDesign [15] X (Knowledge 
enhancement) 
X 
(Enjoyment) 
2 
e-VITA framework for 
SGs [9] 
X (Pedagogical aspects) X (affective 
aspects) 
X (usability) X (Technical 
veri?cation) 
4 
Educational Games 
(EG) Design Framework 
[7] 
X (pedagogy, learning 
content) 
X (Game 
design) 
2 
Design principals for 
serious game [5] 
X (design 
principal) 
1 
Game Factors and Game-
Based Learning Design 
Model [11] 
X (Game 
Factors) 
1 
Total models:15 11 10 6 4 2 3 
Bold X is used when all factors of that attribute are covered by a framework and X when only some are covered. 
Insights into Design of Educational Games 1049
Learning/pedagogical entails the elements related to pedagogy and learning such as 
learning objective, strategy, content and outcome. Game Factors include the features of 
a game world that encompass every perspective of game environment (game de?nition, 
mechanics, narrative, aesthetics, resources). Affective Reactions depict the emotions 
and feelings stimulated during interaction with educational game such as (flow, 
engagement, motivation, enjoyment). Usability signi?es how usable is the educational 
game by its users in achieving its goals (learnability, satisfaction, interface). User is the 
learner/player playing the educational game and their characteristics such as pro?le, 
cognitive and psychological needs. Lastly, environment describes the technical and 
context-related aspects of educational game. Table 2 presents the comparative analysis 
based on these GBL attributes. 
4.2 Validity: Theoretical and Empirical Evidence 
This section analyzes the design frameworks in terms of their validity, examining the 
theoretical and empirical support available for each framework. The theoretical validity 
is examined to explore the development basis and foundations of these design 
frameworks/models. Empirical support is required to see if the existing design models 
are grounded in empirical evidence or applied to any educational game. It is important 
to see if the existing educational game design models have strong practical orientation 
in real life educational game design and development using empirical studies or just 
present in research work. Table 3 details the models/frameworks with their develop-ment 
basis, empirical validation or application, educational games on which the model 
is applied, sample size of empirical study and the elements of model/framework val-idated 
in the study. 
4.3 Framework Attributes 
The existing design frameworks are also analyzed with analytical lens of framework 
attributes mentioned in Table 1. The comparative analysis of educational game design 
frameworks in terms of tool support, assessment and stakeholders, application stage, 
domain, guidance for application and target/adaptability is presented in Table 4. 
Table 5 highlights the strengths and weaknesses of each mentioned framework. For 
this part of analysis, we have only included the design frameworks/models and not 
design principals/guidelines. Therefore, a total of 10 frameworks are compared here. 
The framework attributes are briefly described here: a tool support facilitates to 
capture the design artefacts together with evaluation outcomes, decision rationales and 
measurements that are invaluable assets [23]. A stakeholder is any representative or 
person having interest in the system [23]. A perspective of abstract versus concrete 
guidance allows to assess guidance for application, whether the frameworks offer any 
concrete guidance for their application in designing educational games or just rely on 
abstract rules e.g. to illustrate this “respect people” without providing any guidelines on 
how to perform it is an abstract principle [23]. The target of analyzed design models 
can be categorized as general or speci?c based on whether model can be used for the 
design of any kind of educational game and for any target audience or they focus on 
any speci?c platform, audience or game genre, providing speci?c guidelines for their 
1050 R. Tahir and A. I. Wang
target. Design models are used for the design process of educational games therefore, 
the application stage is the design phase. However, some of these models claim to be 
equally applicable to other stages of development lifecycle. 
Table 3. Comparative analysis of educational game design models/frameworks based on 
validity 
Model 
Ref 
Theoretically 
grounded 
(Development 
basis) 
Empirical 
validation/application 
Educational 
game(s) 
Sample 
size 
Validated 
elements 
[3] Reviewed 
literature* (not 
speci?ed) 
No validation 
[6] Intelligent tutoring 
system literature 
Yes (empirical study) Wu’s Castle 
video game 
61 Learning 
curve 
[8] Experiential 
learning theory, 
flow theory and 
game design 
Yes [31, 32] IT-Emperor 
game, Day 
Off 
221 Flow 
antecedents 
[14] Interviews with 
educational game 
developers, game 
design theory, and 
game analyses 
No validation 
[2] Existing 
components: 
method for 
specifying 
educational 
objectives, 
framework for 
relating game’s 
mechanics, 
dynamics, and 
aesthetics, and 
principles for 
instructional design 
Yes*(case study), 
applied framework to 
analyze the game 
Zombie 
Division 
NI 
[16] Game and 
instructional design 
principals (Keller’s 
ARCS Model, 
Gagne’s events 
principles of 
Bloom’s 
scaffolding) 
Yes*(case 
study), applied for 
evaluation of 
educational game 
Knowledge 
Discovery 
NI Relevance, 
Embedding, 
Transfer, 
Adaptation, 
Immersion, 
Naturalization 
(continued) 
Insights into Design of Educational Games 1051
Table 3. (continued) 
Model 
Ref 
Theoretically 
grounded 
(Development 
basis) 
Empirical 
validation/application 
Educational 
game(s) 
Sample 
size 
Validated 
elements 
[13] Four models: 
Design Framework 
for Edutainment 
Environment, 
Adopted 
Interaction Cycle 
for Games, 
Engaging 
Multimedia Design 
Model for Children 
and GOM 
No validation 
[10] NI No validation 
[4] Experience of 
developing and 
testing educational 
games and using 
research from 
commercial video 
games 
No validation 
(example only) 
The Great 
Entomologist 
Escape 
NI 
[12] NI Case study 1Malaysia 64 
[15] Four-dimensional 
game-design 
evaluation 
framework and 
Bloom six levels of 
knowledge 
Yes* (case study), 
applied to design a 
learning game 
VIEW 
[9] NI Yes* (preliminary 
validation of game 
(results not provided) 
e-VITA-
European life 
experience 
NI 
[7] Compares a few 
frameworks: 
Adaptive Digital 
Game-Based 
Learning 
Framework, Three 
Layered Thinking 
Model, 
Experiential 
Gaming Model and 
Model for 
Educational Game 
Design 
No validation 
(continued) 
1052 R. Tahir and A. I. Wang
5 Discussion 
A comparison among existing models/frameworks clari?es the underlying common 
features and distinctive aspects. Mainly such comparison provides two bene?ts: ?rst to 
help educational game designer/researchers understand and contrast the alternative 
approaches available for selecting an appropriate one, and second to highlight the open 
problems for future research. However, this study has a third key bene?t of guiding 
educational game designers in design phase by highlighting the essential attributes for 
design of educational games. This study performs the comparative analysis of educa-tional 
game design models/frameworks through the perspective of important GBL 
features that in our viewpoint could be considered as the core dimensions and are 
fundamental for an effective GBL product. Although all of these attributes are 
important for educational game development life cycle, but the view or focus may 
change in different phases of design, development and evaluation; leading to some 
attributes more important in one phase than the other. Therefore, the idea is to explore 
this shift and focus. 
RQ1: The comparison among existing models/frameworks in terms of GBL attri-butes 
clari?es the underlying common features for design phase. 11 design models 
included learning attribute mostly focusing on learning objectives, learning content, 
instructional design, knowledge enhancement/transfer and pedagogical aspects. 
10 frameworks focused on game factors with emphases on game design including 
factors such as goals, mechanics, dynamics, aesthetics, narrative and ?delity. However 
only 6 design frameworks focused on affective reactions such as experiential gaming 
model emphasized on flow experience, RETAIN and I’s focused on immersions, 
EGameDesign focused on enjoyment. Although it is a common feature of digital games 
and considered equally important in educational games as well, but in design models it 
comes after learning and game factors. Usability is approached by 4 frameworks/ 
guidelines including e-VITA, experience for mobile game-based learning, usability 
guidelines for mobile educational games and game-based learning guidelines. 
Table 3. (continued) 
Model 
Ref 
Theoretically 
grounded 
(Development 
basis) 
Empirical 
validation/application 
Educational 
game(s) 
Sample 
size 
Validated 
elements 
[5] Literature review 
of related work* 
(not speci?cally 
stated) 
Yes *(case study), 
applied in 2 Math 
video games but no 
evaluation performed 
Gem Game, 
Grandma’s 
Garden Game 
NI 
[11] Literature search of 
studies whose 
primary concerns 
were game factors 
Yes Slice it, Xiao-
Mao 
31 All 11 factors 
NI = Not identi?ed. * is used when it is stated but not explained, not empirical validation or 
when results are not provided 
Insights into Design of Educational Games 1053
Table 4. Comparative analysis of educational game design models/frameworks based on framework attributes 
Model 
Ref 
Tool support Assessment/stakeholder Assessment 
method 
Guidance for 
application 
Target/adaptability Applicable 
stage 
Domain 
[6] NO Mixed (user &model)/ 
students, user 
Qualitative Partial guidance Speci?c/NI Design and 
evaluation 
Computer science 
[8] NO NI NI Abstract General/NI Design and 
analysis 
IT 
[2] NO Expert assessment/ 
designer 
Qualitative Concrete/application 
and use of 
components 
General/NI Design Math 
[16] Yes/Speci?ed 
design and 
evaluation 
criteria 
Expert 
assessment/Teachers 
and instructional 
designers 
Quantitative Concrete/criteria and 
case study to apply it 
General/NI Analysis, 
design, 
development 
and evaluation 
Chinese, math, 
foreign languages 
[13] NO NI NI Abstract General/NI Design NI 
[10] NO NI NI Abstract General/NI Design NI 
[4] NO NI NI Abstract General/NI Design NI 
[9] NO Mixed approach/expert 
and users 
Quantitative 
and 
qualitative 
Abstract Speci?c/Yes (used 
based on game scope 
&characteristics) 
Evaluation and 
design 
Intergenerational 
and intercultural 
learning 
[7] NO NI NI Abstract Speci?c/NI Design Higher education 
[11] Yes User-based/player Qualitative Concrete General/Yes (macro 
elements for 
different genre) 
Design Geometry/history, 
geography, culture 
1054 R. Tahir and A. I. Wang
Table 5. Strength and weakness of existing educational game design models/frameworks 
Model 
Ref 
Strength Weakness 
[6] Uses data-driven analysis of learning 
experiences through visualizations, 
educational data mining, and statistical 
techniques applied to game logs. 
Game-log data are used to model 
learning and identify places of 
improvements 
The steps in the process of designing 
educational games are not clearly 
de?ned 
[8] Model links gameplay with 
experiential learning to facilitate the 
flow experience 
It only provides a link between game 
design and educational theory not 
guiding the whole game design project. 
Several issues such as engaging 
storyline, appropriate graphics and 
sounds, and game balance are not 
included. Only good gameplay cannot 
save learning game 
[2] Useful analytical tool and also assist to 
improve the creativity of educational 
game designer by guiding the 
brainstorming of game ideas from both 
game design and educational angles. 
Encourage thinking across components 
rather than individual approach 
The framework is descriptive and 
dif?cult to apply. It does not offer any 
tool or instrument support as well 
[16] Offers a common framework for 
educators and game designers by 
comprehending the effective 
integration of curriculum and game. 
The model also aids in evaluating the 
effectiveness of games used in 
educational settings as well as to select 
valuable games for use in classrooms 
The model provides guidance to assess 
already developed games for classroom 
use. However, does not provide 
practical guidelines to structure the 
design process for educational game 
development. The criterion for design 
and evaluation should be re?ned 
further to be perfect for educational 
game design in practice 
[13] Emphasize the pedagogical aspects in 
designing educational games 
key features presented for designing 
educational games are based on four 
frameworks and not all are speci?c for 
educational games. No guidance is 
provided on practical application of 
framework 
[10] The triadic theoretical framework 
provides a rich theoretical basis and 
present serious game design elements 
by outlining underpinning theories and 
associated challenges 
Does not provide any concrete 
guidance on steps to integrate them in 
design process or how to operationalize 
them in serious game design 
(continued) 
Insights into Design of Educational Games 1055
Environment is covered by three frameworks [9, 12, 14] focusing on context, mobility 
and technical veri?cation. User attribute is only focused by adaptive digital game-based 
learning framework and game-based learning guidelines that included learner and 
children requirements respectively. Majority of analyzed frameworks focus on two 
attributes (learning and game design) highlighting their importance in design phase. 
None of the design frameworks or even guidelines covered all six attributes. 
RQ2: The analytical lens of validity highlighted that all analyzed frameworks to 
some extent cited some theory or literature to justify their development. The selection 
of a theoretical basis for development of framework is based on the speci?c objectives 
and approach of each framework towards game-based learning. The knowledge of 
underlying developmental base is also important for educational game designer to 
select the framework appropriate to their objectives. Most frameworks are theoretically 
grounded in literature for a pedagogical base and game design principals. Some of the 
pedagogical theories used include Blooms taxonomy, Piaget’s schemes and Gagne’s 
events of instruction, Vygotsky zones of proximal development, experiential learning 
theory and instructional design principals [4, 15, 16, 31]. 
Table 5. (continued) 
Model 
Ref 
Strength Weakness 
[4] Provides a hierarchy with identity as 
core foundational element. Includes 
informed learning concept as an 
important element in hierarchy. It 
exhibits a game concept to demonstrate 
learning game design process 
Model does not provide design steps 
and practical application of these 
concepts in design process with 
reference to their magnitude 
[9] Framework emphasize the threefold 
nature of educational game and include 
technical veri?cation and user 
experience along with pedagogical 
dimension, highlighting critical aspects 
of each 
The framework does not focus on game 
speci?c dimensions and doesn’t 
provide practical guidelines to 
educational game design 
[7] The model emphasizes on higher 
education with game design, pedagogy 
and learning content modelling as main 
factors and is designed speci?cally for 
student self-learning with incorporated 
self-assessment modules 
The model does not provide concrete 
guidance for application. Although 
model focuses on higher education, but 
the compared frameworks used as 
development basis are not speci?c for 
higher education 
[11] Presented macro game design concepts 
that can be adapted to different game 
genre. To build a GBL design model it 
de?nes all factors and also analyze the 
relationships among them 
GBL combines game and education but 
the model only discussed the game 
factors 
1056 R. Tahir and A. I. Wang
Some frameworks (Adaptive digital game-based learning framework and Educa-tional 
Games (EG) design framework) compared existing models as developmental 
base of their framework. Moreover, “I’s” combined the practical experience from ?eld 
with research from commercial games as the development base. When it comes to 
empirical validation or application of design frameworks, only two frameworks level 
up and experiential gaming model had empirical evidence of their validity with sample 
size of 61 and 221 respectively. Learning curve, flow antecedents and game factors in 
[11] were the only elements validated by empirical study. However, the frameworks are 
validated by the authors who proposed them, and no other educational game so far 
reported to use these frameworks in its design. All the other mentioned frameworks 
were not empirically validated, only mentioning it as a future work. However, four 
frameworks: Framework for analysis and design of educational games, RETAIN, 
EGameDesign and design principals for serious game illustrated the application of 
framework on educational game as a case study without actual implementation. 
RQ3: The comparison on the basis of framework attributes highlighted some open 
problems. Surprisingly, no tool support is available by existing educational game 
design frameworks except Game Factors and Game-Based Learning Design Model that 
provided an instrument called “Game factor questionnaire” and RETAIN model which 
provided design and evaluation criteria in terms of level points, higher the points, better 
is the designed educational game. The studied models also differ in terms of assessment 
and stakeholders involved. Framework for analysis and design of educational games 
and RETAIN model focused on expert-based assessment with teachers and designers as 
stakeholders, e-VITA framework for SGs focused on mixed approach of both expert 
and user assessment and Game Factors and Game-Based Learning Design Model 
emphasized on user-based assessment. While the authors of remaining frameworks and 
models did not provide any information. 
Based on comparative analysis, six frameworks (Experiential gaming model, 
Adaptive digital dame-based learning framework, A theoretical framework for serious 
game design, “I’s”, e-VITA framework for SGs and Educational Games (EG) design 
framework) emphasized on abstract principles rather than concrete guidance and are 
limited to high-level concepts without providing any procedural guidance to structure 
the design process of educational games. The other three frameworks provided some 
form of concrete guidance to support educational game design. Framework for analysis 
and design of educational games provided guidance on each of the three components 
by illustrating their application on a zombie game and also guided how to think across 
component during brainstorming. RETAIN provided a criterion with level points to 
assess already developed educational game and a case study to illustrate it. However, it 
did not provide guidance for designing a new educational game. Game Factors and 
Game-Based Learning Design Model suggested macro elements and represented a 
thinking process with a model to help educational game designers incorporate it in their 
game along with an instrument (game factor questionnaire) for assessment. 
The comparative analysis also illustrated that most of the models are general for 
any educational game design and audience. However, there were three speci?c models, 
two of these focused on a speci?c domain (computer science games in level up, 
intergenerational in e-VITA framework) and one focused on speci?c audience (higher 
education students in Educational Games (EG) design framework). The framework 
Insights into Design of Educational Games 1057
attribute of “adaptability in use” is addressed by only two models: e-VITA framework 
which emphasized that framework should be employed depending on the character-istics 
and scope of game and Game Factors and Game-Based Learning Design Model 
that not only emphasized but also provided the opportunity for adaptation by offering 
macro elements that can be adapted for different genre. According to the comparative 
analysis, most of the analyzed frameworks focused only on design stage but three 
models (Level up, Experiential gaming model and e-Vita) can be used for evaluation or 
analysis as well along with design stage. Moreover, RETAIN model claims to be 
applicable for all stage (Analysis, design, development and evaluation) of educational 
games development life cycle. However, no practical usage is available. The educa-tional 
game design models are applied in various educational domains such as com-puter 
science, math’s, geography, culture, language and history are particularly 
mentioned among the compared models. 
6 Conclusion and Future Work 
This paper particularly focuses on design of educational games and reports on the 
comparative analysis of design models/frameworks for game-based learning. The study 
analyzes the use of GBL dimensions and validation in existing frameworks to identify 
essential elements for design stage. Secondly it also highlights the differences and 
similarities between different GBL design frameworks/models by exploring framework 
attributes to guide educational game designer/researchers in making more informed 
decisions and also to underline the open research issues in this area. The results of 
comparative analysis conclude that: Learning/pedagogy (Learning objective, instruc-tional 
design, learning content and knowledge enhancement/outcome) and game fac-tors 
(mechanics, dynamics, narrative, aesthetics, goals) are the most essential attributes 
for the design of educational games. The attributes of affective reactions (flow, 
enjoyment, immersion) comes after learning and game factors. Whereas, usability (user 
interface), user (learner requirements) and environment (including technical and con-text 
related aspects) are less emphasized by the analyzed educational game design 
models. Therefore, the design phase of educational game should emphasize more on 
linking learning objective with game objective in an ef?cient way to facilitate the 
affective reactions such as flow in order to engage and immerse the player [8, 10]. The 
importance of these three attributes in the design of educational game is also evident 
from the developmental basis of these models, most of which are theoretically 
grounded in learning and game design theories with focus on ARCS models and flow 
theory. However, there is a scarcity of evidence for empirical validity and practical 
application of educational game design models for educational game development. 
A few empirical studies and developed educational games that exist for framework 
validation are conducted by the same researchers who developed the framework in 
order to validate it and few elements such as learning curve, flow antecedents and some 
game design factors are empirically validated. A bigger community of educational 
game designers and researchers is needed who are willing to apply these models for 
designing educational games to bring useful insights from industry and go beyond the 
researchers who developed these frameworks. 
1058 R. Tahir and A. I. Wang
Therefore, the analysis brings forward two extremely important issues which are in 
line with the results of [18]; lack of independent evaluation and absence of practical 
application of these design models in educational game industry for designing effective 
educational games. This lack of usage and assessment can also be seen as a result of 
absence of tool support, lack of adaptability and concrete guidance for practical 
application of framework concepts in the design process of educational game devel-opment. 
However, one aspect could also be that most of industry work is not published 
in research community and a collaboration between industry and research is important 
for thorough insights. Also, most of the frameworks do not provide any information on 
assessment approach, method or stakeholder(s) that are required to participate in 
assessment. 
For overcoming these issues, future research should focus on providing concrete 
guidelines and steps to use the framework’s principals for educational game design in 
practice for example if a framework focuses on linking gameplay and learning so 
researcher should provide practical insights about how certain learning objective such 
as problem solving can be seamlessly embedded in game mechanics or if focus is 
challenges then how to increase learning complexity along with increasing game 
challenges and mapping learning content to game tasks and narrative. The future 
research should also guide the game designers on assessment of the design principals 
(that the models provides) embedded in their educational game as part of design phase. 
Finally, there is an extreme lack of tool support for available educational game design 
models which need to be addressed to make ways for framework-based educational 
game design by providing tool support for practical application. The future work will 
focus on the development and evaluation models for educational games to investigate 
and compare the shift in dimensional focus between different stages of educational 
game development lifecycle. 
References 
1. Ahmad, M., Rahim, L.A., Arshad, N.I.: Towards an Effective Modelling and Development 
of Educational Games with Subject-Matter: A Multi-Domain Framework. IEEE (2015) 
2. Aleven, V., Myers, E., Easterday, M., Ogan, A.: Toward a Framework for the Analysis and 
Design of Educational Games. IEEE (2010) 
3. Alfadhli, S., Alsumait, A.: Game-Based Learning Guidelines: Designing for Learning and 
Fun. IEEE (2015) 
4. Annetta, L.A.: The, “I’s” have it: a framework for serious educational game design. Rev. 
General Psychol. 14(2), 105 (2010) 
5. Chorianopoulos, K., Giannakos, M.N.: Design principles for serious video games in 
mathematics education: from theory to practice. Int. J. Serious Game 1(3), 51–59 (2014) 
6. Eagle, M.: Level Up: A Frame Work for the Design and Evaluation of Educational Games. 
ACM (2009) 
7. Ibrahim, R., Jaafar, A.: Educational Games (EG) Design Framework: Combination of Game 
Design, Pedagogy and Content Modeling. IEEE (2009) 
Insights into Design of Educational Games 1059
8. Kiili, K.: Digital game-based learning: Towards an experiential gaming model. Int. Higher 
Educ. 8(1), 13–24 (2005) 
9. Pappa, D., Pannese, L.: Effective design and evaluation of serious games: the case of the 
e-VITA project. In: Knowledge Management, Information Systems, e-Learning, and 
Sustainability Research, pp. 225–237 (2010) 
10. Rooney, P.: A theoretical framework for serious game design: exploring pedagogy, play and 
?delity and their implications for the design process (2012) 
11. Shi, Y.-R., Shih, J.-L.: Game factors and game-based learning design model. Int. J. Comput. 
Games Technol. 2015, 11 (2015) 
12. Shiratuddin, N., Zaibon, S.B.: Designing User Experience for Mobile Game-Based 
Learning. IEEE (2011) 
13. Tan, P.-H., Ling, S.-W., Ting, C.-Y.: Adaptive Digital Game-Based Learning Framework. 
ACM (2007) 
14. Thomas, S., Schott, G., Kambouri, M.: Designing for learning or designing for fun? Setting 
Usability Guidelines for Mobile Educational Games. Learning with Mobile Devices: A Book 
of Papers, pp. 173–181 (2004) 
15. Yu, S.-C., Fu, F.-L., Su, C.: EGameDesign: guidelines for enjoyment and knowledge 
enhancement. In: Hybrid Learning and Education, pp. 35–44 (2009) 
16. Zhang, H., Fan, X., Xing, H.: Research on the Design and Evaluation of Educational Games 
Based on the RETAIN Model. IEEE (2010) 
17. Neil, K.: Game Design Tools: Time to Evaluate (2012) 
18. dos Santos, A.D., Fraternali, P.: A Comparison of Methodological Frameworks for Digital 
Learning Game Design. Springer, Heidelberg (2015) 
19. Ahmad, M., Rahim, L.A., Arshad, N.I.: An analysis of educational game design frameworks 
from software engineering perspective. J. Inf. Commun. Technol. 14 (2015) 
20. Malliarakis, C., Satratzemi, M., Xinogalos, S.: Designing Educational Games for Computer 
Programming: A Holistic Framework. Electr. J. e-Learning 12(3), 281–298 (2014) 
21. Tahir, R., Wang, A.I.: State of the art in game based learning: dimensions for evaluating 
educational games. In: Academic Conferences International Limited (2017) 
22. Song, X., Osterweil, L.J.: Toward objective, systematic design-method comparisons. IEEE 
Softw. 9(3), 43–53 (1992) 
23. Abrahamsson, P., Oza, N., Siponen, M.T.: Agile Software Development Methods: A 
Comparative Review1. Springer, Heidelberg (2010) 
24. Chowdhury, A.F., Huda, M.N.: Comparison between Adaptive Software Development and 
Feature Driven Development. IEEE (2011) 
25. Katayama, E.T., Goldman, A.: From manufacture to software development: a comparative 
review. In: Agile Processes in Software Engineering and Extreme Programming, pp. 88–101 
(2011) 
26. Tripathi, P., Kumar, M., Shrivastava, N.: Theoretical Validation of Quality Metrics of Indian 
e-Commerce Domain. IEEE (2009) 
27. Babar, M.A., Gorton, I.: Comparison of Scenario-Based Software Architecture Evaluation 
Methods. IEEE (2004) 
28. Yusof, N., Rias, R.M.: Serious games in psychotherapy: a comparative analysis of game 
design models (2006) 
29. Abrahamsson, P., Warsta, J., Siponen, M.T., Ronkainen, J.: New Directions on Agile 
Methods: A Comparative Analysis. IEEE (2003) 
1060 R. Tahir and A. I. Wang
30. Ahmed, F.F.: Comparative Analysis for Cloud Based e-learning. Procedia Comput. Sci. 65, 
368–376 (2015) 
31. Kiili, K.: Evaluations of an experiential gaming model. Hum. Technol. Interdisc. J. Hum. 
ICT Environ. (2006) 
32. Kiili, K.: Content creation challenges and flow experience in educational games: the IT-
Emperor case. Int. Higher Educ. 8(3), 183–198 (2005) 
33. Tahir, R., Wang, A.I.: Codifying game-based learning: LEAGUE for evaluation. In: 
Proceedings of the 12th European Conference on Games Based Learning (2018) 
Insights into Design of Educational Games 1061
Immersive and Collaborative Classroom 
Experiences in Virtual Reality 
Derek Jacoby(B) , Rachel Ralph, Nicholas Preston, and Yvonne Coady 
University of Victoria, Victoria, Canada 
derekja@gmail.com 
Abstract. In these early days of educational Virtual Reality (VR) appli-cations, 
it is critical to establish best practices for exploring the subtle 
relationship between experiences in VR and learning. In contrast to typ-ical 
user studies, the evaluation of a VR experience o?ered by a proto-type 
can be subject to the intermittent breaking of an illusion; something 
users tend not to recover from. Our work proposes a set of metrics related 
to presence, immersion and flow, and considers them in the context of 
two case studies. First, the results of a 60 user exploratory study reveal 
the need to not only modify the proposed metrics, but to innovate in 
terms of collaborative experiences. Second, key ways to introduce cost-e?ective 
collaboration mechanisms into educational VR experiences are 
introduced. Both of these studies are the result of ongoing work with the 
Royal BC Museum. 
Keywords: Virtual Reality 
·
Education 
·
Disaster preparedness 
Interactive education 
·
Collaborative education 
1 Introduction 
A Virtual Reality (VR) experience can provide a profound means of knowledge 
transfer. Essentially, VR allows a user to take their own path through a con-textualized 
knowledge-base in a realistic (or non-realistic), natural (or virtual), 
interactive way. Additionally, virtual surroundings evoke psychological and phys-ical 
reactions, that potentially have deeper impact than other forms of traditional 
media. Evaluating the illusive impact of an experience is di?cult—in particular 
if it is an experience o?ered by a VR prototype that su?ers from possible tran-sient 
glitches that could break the illusion. As a result, user studies have to try 
to tease out the phenomenon of a “broken illusion” due to a possible technical 
glitch from other factors. 
In the course of several design iterations, we have experimented with di?er-ent 
styles of interaction and are developing a methodology for evaluating the 
e?ectiveness of VR-based educational software. Part of the challenge in an edu-cational 
setting is also cost, and in the ?nal work reported here we o?er a collab-orative 
classroom environment that uses only two headsets, along with 6 tablet 
computers, designed to provide an immersive experience for an entire class. The 
two educational experiences reported here are: (1) a historical exploration of the 
.h
c Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1062–1078, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_79
VR for Classrooms 1063 
gold rush designed in collaboration with the Royal BC Museum (RBMC), and 
(2) the interactive experience of a tsunami in Port Alberni designed with Ocean 
Networks Canada (ONC). 
2 Related Work 
Twenty-?rst century learning has been described as learning that encourages 
high-level thinking skills and the development of technological literacies. It also 
includes: problem solving, critical thinking, self-directed learning, and collabo-ration 
by solving real-world scenarios and involves the use of technology; and 
moreover, these skills are transferable among subjects, grade levels, and life [1,2]. 
Twenty-?rst century learning shifts towards higher-order thinking skills as well 
interpersonal and self-directed skills, or the ability to work in a team or indi-vidually 
and become a leader while being accountable and adaptable; in other 
words, a form of social responsibility. 
The use of VR and digital media seem to have created a “new realm of inter-action” 
[2] (p. 132) in the 21st century. Even though, there are a variety of con-cerns 
from too much screen time to logistical lack of devices or non-functioning 
issues [3–5], a deeper understanding of why and how to integrate VR into ped-agogical 
practices is needed. A growing number of educators are interested in 
the “interaction age” in which students and teachers shift their expectations to 
adapt to the changing job market [2]. These concerns drive the need to explore 
21st century learning in education, especially as it links to VR. 
Today, many digital technologies are more closely linked to this experience of 
interaction. Technology and in particular some VR developments allow, encour-age, 
and force interdisciplinary applications [1]. VR allows 21st century learn-ers 
to dream, explore, and collaborate. Rather than watching television and 
increasing the amount of screen time for students, a warning from the Ameri-can 
Academy of Pediatrics (AAP), we need to reshape the thought of creating 
digitally literate learners who use technology to interact and ?nd information. 
Prior research in VR describes how we can use VR to promote skills and knowl-edge 
through its immersive and interactive qualities [6]. Some researchers have 
begun to explore quasi-experimental ways of measuring successful VR expe-riences 
through various knowledge pre- and post-tests, focusing on measuring 
content knowledge [7,8]. Other researchers have used surveys or questionnaires 
to measure the VR experience in general [9]. Still other researchers have mea-sured 
presence, immersion, and ?ow as a way of understanding immersion and 
interaction, which can lead to learning [10,11]. There are several survey ques-tionnaires 
that have been developed and validated that would be appropriate 
for measuring learning, such as the Presence Questionnaire and the Immersion 
Tendency Questionnaire [11,12]. 
Presence is described as a “psychological state of being there mediated by 
an environment that engages our senses, captures our attention, and fosters our 
active involvement” [11] (p. 298). Immersion is also a psychological state and 
can be characterized as “perceiving oneself to enveloped by, included in, and
1064 D. Jacoby et al. 
interacting with an environment that provides a continuous stream of stimuli 
and experiences” [11] (p. 299). 
Another questionnaire that has potential to capture the subtle consequences 
of VR focuses on ?ow. Flow is a state where “people feel involved in meaningful 
actions, maintain a sense of control and stay focused on a goal” [10] (p. 506). 
The ?ow experience “seems to occur only when a person is actively engaged in 
some form of clearly speci?ed interaction with the environment” [13] (p. 43). 
Similar to presence and immersion, ?ow is focuses on active engagement within 
an environment. For example, Bressler and Bodzin [10] used a short ?ow state 
scale to measure ?ow in a post-survey with students. These survey questionnaires 
could be the grounding for measuring some 21st century experiences and could 
be combined with knowledge tests to measure VR experiences. 
3 Case Study 1: The Gold Rush in British Columbia 
In this case study, we evaluate a prototype educational VR experience designed 
for a museum installation involving the “gold rush” (Figs. 1 and 2). Speci?cally, 
we address the following case study questions: 
– What metrics should be considered in the user survey? 
– How does each metric weigh into the experience, and how are they related? 
– How can we quickly explore and navigate between quantitative and qualitative 
results? 
This case study focuses on our evaluating a prototype built for an exhibit 
for the Royal British Columbia Museum (RBCM) for children and young adults 
to experience British Columbia in the mid 1800s. The experience includes infor-mation 
about the era and the location including a “?y-over” experience (Fig. 2), 
along with an opportunity to “pan for gold”–meaning the user must, in VR, 
scoop sand and water into a pan and gently agitate it to allow the gold to sink 
to the bottom of the pan. Metrics included in the user survey were composed 
from 14 previously validated metrics for presence, immersion, and ?ow. Rele-vance 
was determined using principle component analysis, which revealed that 
our survey results appeared to be sound. Correlation coe?cients between fac-tors 
were derived from quantitative results, and organized in a hierarchy. Metrics 
from the hierarchy were explored using a brush technique in a parallel coordi-nate 
graph, supporting exploration and navigation from the quantitative to the 
qualitative results (Fig. 3). 
The remainder of this case study is organized as follows. We ?rst identify 
validated metrics from related work to create our own survey, and evaluate the 
soundness of the results. The case study concludes with a list of possible improve-ments 
and a discussion on generalizability. 
What factors should be considered in the survey? 
This project focused on an adapted presence, immersion, and ?ow question-naire 
consisting of 14 questions were chosen from previous validated studies 
based on the design of this project and the VR prototype (Fig. 4). Also, four
VR for Classrooms 1065 
Fig. 1. Participant wearing the HTC Vive and headphones in ?rst ?yover. 
open ended questions were asked. Three were yes/maybe/no questions (Q1, Q2, 
Q3), each with room to add comments in free text (Q1q, Q2q, Q2q). 
Q1: Did this increase your interest in the gold rush? Q1q: Why or why not? 
Q2: Do you want to try this experience again? Q2q: Why or why not? 
Q3: Would you come to a museum to try other experiences like this? Q3q: Why 
or why not? 
Along with a ?nal opportunity to comment on anything, free form. Quali-tative 
data from the questionnaire was analyzed through open-thematic coding 
and compared with the quantitative results. 
3.1 Methodology 
The participants include a sampling of graduate students (n = 60) chosen for 
their interest in VR. Each participant was asked to sign a consent form and 
video and image release form before doing the VR experience. In the prototype 
experience, users put on an HTC Vive headset and headphones. The ?rst part 
of the experience had users stand on a platform as it travelled over a canyon 
(a “?y-over” experience), lasting about one minute. During this time, the users 
listened to the sounds of the canyon as well as a narrative describing the gold 
rush experience. 
Once they travelled from one side of the bridge to the other, the user is 
teleported down to the water’s edge. Here they were guided by the game handler
1066 D. Jacoby et al. 
Fig. 2. Overview of gold panning canyon seen by participant in the ?yover. 
Fig. 3. Manual categorization of qualitative results from comments. 
to ?nd the pan on the ground. The participants then use the trigger button on 
the handheld controllers to activate di?erent gestures to remove di?erent debris 
from the pan, which could take one to three minutes depending on the users’ 
abilities. After four di?erent gestures, the participants were successful in ?nding 
gold. Once the participant had been in the experience for a maximum of 5 min, 
they are asked to ?ll out an online questionnaire (Table 1). 
3.2 Findings 
Open Thematic Coding: Table 2 shows how the results Q1–Q3 aligned with 
the open-thematic coding of the qualitative results (Q1q, Q2q and Q3q). One 
interesting pattern that emerged was that neutral (“maybe”) results in Q1–Q3 
tend to be coded as negative in the qualitative comments. 
These ?ndings would be consistent with the phenomenon of the illusion being 
broken for the user, and not being able to recover. In some cases this was a tech-nical 
problem, while in others it was more closely aligned with not meeting user
VR for Classrooms 1067 
Table 1. Questions in our survey, assessing immersion, presence, and ?ow 
Number Question 
Question 1 How completely were you able to actively survey or search the 
environment using vision? 
Question 2 How involved were you in the virtual environment experience? 
Question 3 How quickly did you adjust to the virtual environment 
experience? 
Question 4 How much did the visual display quality interfere or distract you 
from performing assigned tasks or required activities? 
Question 5 How much did the control devices (handheld) interfere with the 
performance of assigned tasks or with other activities? 
Question 6 How well could you concentrate on the assigned tasks or required 
activities rather than the actual VR mechanisms 
(headset/handles) used to perform those tasks or activities? 
Question 7 How much did the auditory (sound) aspects of the environment 
involve you? 
Question 8 How well could you move or manipulate objects in the virtual 
environment? 
Question 9 I was challenged and I felt I could meet the challenge. 
Question 10 How much did you lose track of normal time outside of the 
virtual experience? 
Question 11 Did you enjoy what you were doing? 
Question 12 Were you ‘in the zone’? 
Question 13 How mentally alert do you feel at the present time? 
Question 14 How good are you at blocking out external distractions when you 
are involved in something? 
Table 2. Qualitative results comparing positive, neutral, and negative statements 
Result Q1 Q1q Q2 Q2q Q3 Q3q 
Positive 27 29 37 37 47 48 
Neutral 19 3 16 3 10 2 
Negative 14 28 7 20 3 10 
Q1: Did this increase your interest in the 
gold rush? Q1q: Why or why not? 
Q2: Do you want to try this experience 
again? Q2q: Why or why not? 
Q3: Would you come to a museum to try 
other experiences like this? Q3q: Why or 
why not?
1068 D. Jacoby et al. 
expectations. For example, some participants described that using the handheld 
controllers and understanding how to pan for gold was complicated. Some users 
also described that they felt they were “doing it wrong” and could not ?gure it 
out without assistance from the game handler. 
Also, some users felt that the “real” experience was impeded by some tech-nical 
issues. In particular, one user said that sometimes the hands would go 
through the pan instead of staying on the edge. Another user also said that they 
tried to tilt the pan and there was no reaction in the game, which made the 
“reality” of the activity compromised. The majority of users who felt confused 
or challenged by the experience said that the experience was too short. Many 
users identi?ed that they wished the experience was much longer. Though the 
prototype had problems, the level of interest and engagement of the participants 
was quite high. Overall, over 78% of users want to go to a museum to try an 
experience like this. The majority of the participants described the general expe-rience 
very positively saying “it makes learning more interesting” and “super fun 
and interactive”. Other comments of note included, “I think VR helps us get a 
better experience of touring around the museum”, “I could see this being a very 
e?ective at teaching”, and “it gives you an opportunity to get more involved”. 
How Does Each Metric Weigh into the Experience, and How Are 
They Related? The relationship between immersive tendencies of individuals 
and presence experienced in VR was investigated using Pearson product-moment 
correlation coe?cient (Fig. 4). Preliminary analyses were performed to ensure 
no violation of the assumptions of normality, linearity and homoscedasticity. 
There was a small, positive correlation between the two variables, r = .29, n = 60, 
p < .05, with high levels of presence associated with high levels of immersion. 
Fig. 4. Pearson product-moment correlations between measures of presence and 
immersion.
VR for Classrooms 1069 
The relationship between immersive tendencies of individuals and ?ow 
experienced in VR was investigated using Pearson product-moment correlation 
coe?cient (Fig. 5). There was a medium, positive correlation between the two 
variables, r = .31, n = 60, p < .05, with high levels of ?ow associated with high 
levels of immersion. 
Fig. 5. Pearson product-moment correlations between measures of ?ow and immersion. 
Based on the correlation coe?cient analysis, we organized the top values 
organized as the hierarchy (see Fig. 8). This showed the importance of factors 
such as feeling “enjoyment”, “in the zone”, “challenged”, “involved”, along with 
the ability to “survey”, “adjust” and “concentrate”. We explored these factors 
using brushing (or selecting) in a Parallel Coordinate Graph (PCG). The dataset 
allows us to visualize and explore individual results (ids 1–60) in the hierarchy 
(see Fig. 6). We additionally splayed the values slightly, so that the individual 
results can be drilled into and explored. 
Fig. 6. Correlations organized as a hierarchy of factors.
1070 D. Jacoby et al. 
How Can We Quickly Explore and Navigate Between Quantitative 
Metrics and Qualitative Results? In this graph, values of 4 and above for 
each of the “enjoy” and “in the zone” factors have been coloured green, values 
of 3 are blue, and below 3 are red. 
The PCG (see Fig. 7) allows for exploration of further relationships within 
this data set, for example brushing or selecting values of (a) “ability to 
adjust” >= 4, with (b) “ability to survey” >=4. 
Fig. 7. Parallel coordinate graph of factors from the hierarchy. 
Further more, brushing allowed for broader exploration of relationships 
between all factors in the quantitative results, while providing a mapping to 
the qualitative as well. This technique was useful to navigate to the correspond-ing 
qualitative results, in order to better understand the subtle issue of a broken 
illusion versus unmet expectations. 
Overall, our results of this ?rst case study are challenging to categorize largely 
due to the broken illusion problem. Clearly, participants identi?ed a number of 
areas for prototype improvement, while still indicating high levels of presence, 
?ow, and immersion. Despite the shortcomings of the prototype being evaluated, 
they identi?ed the potential for VR experiences to increase engagement, interest, 
and learning in informal learning museum spaces. However, in order to increase 
engagement and resilience to the broken illusion, we were clearly going to have to 
innovate with collaboration in the next prototype. The following section discusses 
the generalizability of these early results, before introducing the next Case Study. 
3.3 Case Study 1: Generalizability 
Detailed qualitative feedback has allowed us to identify several technical issues 
with the handheld controllers, the height of the lighthouse boxes of the HTC 
Vive for gold panning, and the need to expand recognized gestures. All of these 
glitches broke the illusion for the users that experienced them, those some of 
them were not related to any one factor in our questionnaire. 
Our users were a homogenous group with high interest in VR, and it’s poten-tial 
in a museum. One of the most frequent terms in the feedback was that the 
experience was too short and that the participants want to spend longer in 
this “gold rush” era. This could have impact on several of the factors in the 
questionnaire. 
This process was valuable for us in terms of allowing us to evolve not only 
our prototype but also our questionnaires. Additionally, we needed to consider
VR for Classrooms 1071 
how to make the VR experience less isolating and more collaborative. We did 
this by further customizing validated metrics, focusing on key factors such as 
collaboration. Each of these elements were informed by exploring the general 
relationship between quantitative and qualitative results, which allowed us to 
examine subtle relationships and tradeo?s in our ?rst Case Study. 
4 Case Study 2: Tsunami Preparedness 
The Royal BC Museum installation work is designed for single user interactions. 
The entire experience must be relatively short to allow other museum guests to 
access the experience, but in order to be a self-contained experience it must be 
at least 4 or 5 min long. Our next case study targets a much di?erent environ-ment; 
a classroom. In partnership with Ocean Networks Canada (ONC), we are 
working on tsunami early warning and emergency preparedness (see Fig. 8). The 
experience we are creating here is aimed at engaging a full class of students for 
a longer period of time. Due to cost constraints, only two headsets are available 
for the class, so the experience is built for headsets and tablet computers, and 
even android cell phones. 
Because this classroom experience has a broader set of educational goals, our 
approach has been to develop a set of mini-games that elucidate various aspects 
of those goals. No individual mini-game lasts more than 2 to 3 min, and they 
are designed to cause the students to pass devices around between mini-games 
so that the greatest number of students are exposed to the VR headsets. The 
mini-games are generally collaborative, but there are some competitive aspects 
to the design as well so as to keep attention and engagement focused on the 
designed goals. 
Fig. 8. The VR Tsunami classroom game.
1072 D. Jacoby et al. 
4.1 VR Tsunami Game Design 
Our approach to a collaborative classroom learning environment involves physi-cal 
props and devices, tablet computers, and two VR headsets. In order to keep 
a consistent experience among the devices we have introduced a guide character, 
Allison, that is consistent among all of the interactions (Figs. 9 and 10). This is 
both to provide consistency, and also because part of the user interaction is via 
speech recognition and the use of an obviously non-human guide character helps 
to set expectations of less-than-human levels of speech understanding during the 
interactions. 
Fig. 9. Physical model of our guide, Allison. 
Fig. 10. The guide, Allison, also appears in the virtual environments.
VR for Classrooms 1073 
The mini-games on the devices are structured such that the students must 
pass devices around frequently. This is to make sure that all students have an 
opportunity to use all the devices, but also because these opportunities to inter-act 
with each other during the device changes are an important part of collabo-ration. 
All of the student mini-games contribute to a team score, so the class is 
motivated to all work together. Although most of our testing so far has been in 
groups of 6–10 students (using only two tablets, Fig. 11) the system is designed 
to accommodate 30 students on 6 tablets, 2 headsets, and one physical model of 
Allison. 
Fig. 11. A class of students using multiple devices. 
The game is divided into three phases, each of which incorporates multi-ple 
mini-games contributing to the ?nal score for the class. In the ?rst phase, 
the preparation phase, the students learn about tsunami early warning and the 
importance of planning. Some of the games include preparing emergency kits 
(Fig. 9) and ?ying underwater vehicles to install and repair bottom pressure 
recorders to give notice of an incoming tsunami (Fig. 12). 
The second phase, the earthquake phase, is triggered by an earthquake. We 
are using the town of Port Alberni as the model town for this experience because 
due to it’s geography it is particularly susceptible to tsunami activity caused by 
near-?eld earthquakes. There is a fault line o? of the west coast of Vancouver 
Island, and Port Alberni is at the head of a long, narrow inlet which serves to 
amplify tsunami waves that would be generated from an earthquake. This initial 
phase of the disaster is all about the approximately one hour that the town 
would have to prepare between the earthquake and the subsequent tsunami. 
Time is sped up in the game, and during this phase students must perform 
mini-games such as controlling ?retrucks to ?ght ?res caused by broken gas
1074 D. Jacoby et al. 
Fig. 12. A mini-game to repair a bottom pressure recorder (BPR) to detect tsunamis. 
Fig. 13. A mini-game to direct ?retrucks dispatched to resolve ?res caused by the 
earthquake. 
lines in the earthquake (Fig. 13), clearing evacuation routes, making emergency 
announcements, and driving boats in the harbour to control oil spills and rescue 
individuals that are at greatest risk from the tsunami. 
The ?nal phase, the tsunami phase, occurs when the tsunami reaches Port 
Alberni. During this phase (and the earthquake phase), the VR-based mini-games 
are mostly centered around a map of Port Alberni that has been create 
from digital elevation models from Ocean Networks Canada. These elevation
VR for Classrooms 1075 
models allow reliable predictions to be developed as to the speed of the incoming 
tsunami and what areas of the town are most at risk. The mini-games during 
this phase consist of rescuing individuals and providing guidance towards escape 
routes. The views of the town are both from a map-view perspective (including 
getting closer using the magnifying glass, Fig. 14) and a drone-based overhead 
view of the emerging disaster. 
Fig. 14. Incoming tsunami! 
After all of the tsunami-phase mini-games are complete, the class reviews 
their score and has the opportunity to compare against the performance of other 
teams (Fig. 15). It is at this point that the teacher will also conclude with some of 
the lessons experienced interactively during the game - after seeing the disaster as 
personally relevant and witnessing it “?rst-hand” it is hypothesized that students 
will be much more motivated to retain information conveyed during the lesson. 
Although our studies con?rming this are not yet complete, the next section will 
discuss some of the surveys and analyses that we will be using to make this 
case. 
4.2 Evolution from Case Study 1 to Case Study 2, and Beyond 
As highlighted above, the new prototype leverages collaboration to enhance all 
three qualities of presence, immersion and ?ow. Additionally, the questionnaires 
have been modi?ed and the audience broadened. The new survey focuses on 
direct measures from the students as to how enjoyable and useful they found 
the VR immersion, how the sound and graphics aided their understanding of 
tsunamis, and how the collaborative system allowed them to work together. 
As important as the direct measures are the teacher surveys. In the actual
1076 D. Jacoby et al. 
Fig. 15. Students reviewing their class score. 
deployment of this system there will be an Ocean Networks Canada educational 
outreach worker in each classroom to set up the system and engage the students 
with some activities to support the collaborative game. The regular classroom 
teachers will have an opportunity to comment on the e?ectiveness of the collab-orative 
game developed and described here, and the ONC education outreach 
worker will be a constant across classrooms to help provide feedback. 
Not only does broadening the audience assist with our own evaluation, but 
also development of future prototypes. One of the precarious elements of this 
project is that most of the development is being done by computer science 
and software engineering undergraduate students on co-op work placements and 
internships. This means that they are both generally inexperienced and looking 
for high impact projects with real users to learn on. Thus having a project with 
direct feedback from the students, teachers, and our Ocean Networks Canada 
client is a good environment for our undergraduate students. 
This classroom project is in it’s very early stages and has been deployed only 
in test environments so far, but we plan to do another user evaluation much like 
Case Study 1. 
5 Conclusions and Future Work 
More research is needed to directly explore how to precisely measure presence, 
immersion, and ?ow but it is encouraging that all three appear to be positively 
correlated. Our work represents just the ?rst few steps down the long road to 
developing quantitative and qualitative metrics for 21st century skills. Measure-ment 
of knowledge retention is another possible way of assessing VR educational 
environments, and we will investigate this as our applications are deployed in
VR for Classrooms 1077 
real classrooms. Our early results suggest that self-reporting and automated 
collection of datasets may be the best way to start to assess new metrics, but 
still require careful analysis to uncover subtle relationships between factors that 
contribute to the experience in VR. 
As we roll out these experiences to the Royal BC Museum and classrooms 
around British Columbia, our study results will be augmented by real-world 
usage data and qualitative feedback from teachers and students. Our approach 
of mixed device use is currently cost-driven, but the theoretical basis for consid-eration 
as a better pedagogical approach requires further study and validation 
of our hypotheses of greater collaboration and engagement with peers. Short 
experiences in VR to make the situation seem personal and engaging, combined 
with longer collaborative sessions with classmates on tablets, may in the end 
prove more e?ective than protracted isolated experiences. 
Acknowledgments. We thank all the participants who were available for this study, 
and the students involved in creating the prototypes. 
References 
1. Kereluik, K., Mishra, P., Fahnoe, C., Terry, L.: What knowledge is of most worth. 
J. Digit. Learn. Teacher Educ. 29(4), 127–140 (2013). https://doi.org/10.1080/ 
21532974.2013.10784716 
2. Saavedra, A.R., Opfer, V.D.: Learning 21st-century skills requires 21st-century 
teaching. Phi Delta Kappan 94(2), 8–13 (2012). https://doi.org/10.1177/ 
003172171209400203 
3. Hanson, K., Shelton, B.: Design and development of virtual reality: analysis of 
challenges faced by educators. ITLS Faculty Publications, vol. 11, p. 01 (2008) 
4. Huang, H.: Toward constructivism for adult learners in online learning environ-ments. 
Br. J. Educ. Technol. 33(1), 27–37 (2002). https://onlinelibrary.wiley.com/ 
doi/abs/10.1111/1467-8535.00236 
5. Huang, H.-M., Rauch, U., Liaw, S.-S.: Investigating learners’ attitudes toward vir-tual 
reality learning environments: based on a constructivist approach. Comput. 
Educ. 55(3), 1171–1182 (2010). http://www.sciencedirect.com/science/article/pii/ 
S0360131510001466 
6. Bricken, M., Byrne, C.: Summer students in virtual reality: a pilot study on educa-tional 
applications of virtual reality technology. University of Washington (1992) 
7. Chen, C.-T.: Development and evaluation of senior high school courses on emerging 
technology: a case study of a course on virtual reality. Turk. Online J. Educ. 
Technol. 11(1), 46–59 (2012) 
8. Hauptman, H.: Enhancement of spatial thinking with virtual spaces 1.0. Comput. 
Educ. 54(1), 123–135 (2010). https://doi.org/10.1016/j.compedu.2009.07.013 
9. Tcha-Tokey, K., Christmann, O., Loup-Escande, E., Richir, S.: Proposition and 
validation of a questionnaire to measure the user experience in immersive virtual 
environments. Int. J. Virtual Reality 16(1), 33–48 (2016) 
10. Bressler, D., Bodzin, A.: A mixed methods assessment of students’ ?ow experiences 
during a mobile augmented reality science game. J. Comput. Assist. Learn. 29(6), 
505–517 (2013). https://onlinelibrary.wiley.com/doi/abs/10.1111/jcal.12008
1078 D. Jacoby et al. 
11. Witmer, B.G., Jerome, C.J., Singer, M.J.: The factor structure of the presence 
questionnaire. Presence Teleoperators Virtual Environ. 14(3), 298–312 (2005). 
https://doi.org/10.1162/105474605323384654 
12. Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: a pres-ence 
questionnaire. Presence Teleoperators Virtual Environ. 7(3), 225–240 (1998). 
https://doi.org/10.1162/105474698565686 
13. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper and 
Row, New York (1990)
The Internet of Toys, Connectedness 
and Character-Based Play in Early Education 
Pirita Ihamäki1(&) and Katriina Heljakka2 
1 
Prizztech Ltd., Pori, Finland 
Pirita.ihamaki@prizz.fi 
2 
University of Turku, Turku, Finland 
Abstract. The concept of the Internet of Things de?nes the idea of the 
Internet – a global, interconnected network of computers connected to everyday 
objects, products, and other objects in the surrounding environments. Again, at 
the heart of the concept of the Internet of Toys lies the idea of playthings that are 
capable of information processing and communicating with children, with other 
connected toys and their environment, and even autonomous decision taking. 
This study aims to understand smart and connected toys potentialities in the 
context of toy-based learning. We have conducted a study with 20 preschool-aged 
children from ages 5 to 6 years by using a group interview and playtests 
with three Internet of Toys’ playthings. Our main conclusions are that although 
these toys as ‘edutainment’ cater for opportunities for toy-based learning, one of 
the key factors for preschoolers is the creative play patterns that they come up 
with these character toys. This imaginative form of play may even overshadow 
the toys educational potential unless they are used in the context of guided play. 
Keywords: Internet of Things
.f
Internet of Toys
.f
Toyi?cation 
Toy-based learning 
1 Introduction 
The novelty of our contribution on the Internet of Toys research is that we contextu-alize 
our experiences with them to understand the educational values of these toys, 
which actualize in a play situation in an early toy-based learning environment. Lev 
Vygotsky sees that play is a key mechanism for cognitive development as children 
learn and develop in the context of play. Play seems to be a natural and universal 
learning tool for children and adults. Through play, children can acquire skills without 
knowing it in the most natural way. It can be a lifelong and enjoyable activity to carry 
out, and we should think of play as a lifelong learning tool because play represent 
recreation activities which are easy and fun to do. Lindon (2002) pointed out that 
“children use play to promote their own learning, they do not have to be persuaded into 
playing” [1]. The educators can use play in developing basic skills: to explore, con-struct, 
imitate, discuss, plan, manipulate, problem-solve, dramatize, create and exper-iment 
[2]. In our study the core issue focuses on the Internet of Toys, which play an 
emerging role in supporting and encouraging contemporary play. For children as well 
as adults, play offers physical, social and cognitive bene?ts and integrates all three 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1079–1096, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_80
areas of development. The Internet of Toys encourages three subtypes of play: 
(1) pretend play, (2) object play, and (3) physical play. We see that in the core of 
pretend play is imagining. Children deconstruct reality implicitly asking “what if?” For 
example, a group of children may perform a detailed story using toy ?gures as if they 
were real. Object play refers to the use of toys as well as everyday materials (e.g. 
phones, tablets) and objects during play. Object play may involve instances of make-believe. 
This type of play entails handling, exploring, and focusing on an object and its 
features as opposed to using the object only as a story prop, as in dramatic play. 
Moreover, physical play often requires props (e.g. football, basketball). All physical 
play involves moderate to vigorous physical activity and fosters interactions where 
children talk to one another and build cooperation and collaboration [3]. 
In our case study, we found out that a fourth issue which integrates children’s play 
values related to educational purposes of the Internet of Toys, is guided play. Guided 
play refers to learning experiences that combine the child-directed nature of free play 
with a focus on learning outcomes and adult mentorship [4]. Playing with toys in this 
way, by creating make-believe, dramatic scenarios, fuels children’s imagination and 
creativity and invites them to alter the rules of reality. That entails the special features 
of the Internet of Toys, for which adults want to see educational bene?ts. Based on our 
research, we will explain why guided play is important for the use of the Internet of 
Toys in an educational context. One important facet of our ?ndings is to understand 
that edutainment is not the only issue why children want to have playful experiences 
with the Internet of Toys. As the playful experiences are also related to affordances 
beyond education, our goal was to explore children’s own responses to these new types 
of toys, which most often represent character toys. 
Our research questions are the following: 
RQ1 What playful experiences does the Internet of Toys offer for preschool-aged 
children? 
RQ2 What kind of educational values can the Internet of Toys give preschool-aged 
children in a toy based-learning environment? 
In our study, playtests with preschool–aged children have shown that Internet of 
Toys has various play affordances, educational value, social and recreational values. In 
this case study, we have seen how the preschool-aged children explore and engage with 
the Internet of Toys’ playful affordances. Although this play has elements of similarity 
with playing with traditional toys, it is distinguished from them because the Internet of 
Toys provides added social and motivational bene?ts. In other words, with the Internet 
of Toys one may play socially (share one toy) and become motivated by the ‘tasks’ 
given by the toy (through its technologically enabled features). Our study presents the 
Playful Experience Framework for the IoToys, which describes ?fteen different 
experiences the preschool-aged children experienced during our playtests. 
The paper ?rst introduces Internet of Toys (IoToys) as a sub-phenomenon of the 
Internet of Things (IoT). We then move on to discuss earlier examples on IoToys and 
related research. The remainder of the article is organized as follows: We explain the 
concept of toy-based companionship and go on to describe, how current 
technologically-enhanced edutainment is undergoing a toyi?cation. What follows is an 
introduction on toy-based learning and the role of play in these processes. We then 
1080 P. Ihamäki and K. Heljakka
present our case study on IoToys played in a preschool context. In the next section, we 
discuss the results, limitations of our study and propose further ideas on what should be 
considered when using IoToys in future preschool education and how the phenomenon 
could be studied in next phases of research. 
2 The Internet of Things Phenomenon 
2.1 The Internet of Things (IoT) 
The Internet of Things is bringing fundamental changes in economic, environmental, 
healthcare, social and political realms [5]. The domestic Internet of Things (IoT) is the 
current favored term but draws on an extensive lineage of technological visions for the 
future of the home. According to Gartner (2017), 8.4 billion networked things were in 
use by 2017 and increase of 31% on the year before [6]. The European Commission 
(2016, 2) estimates that more than 26 billion things will be connected by 2020 [7]. We 
need to understand that once connected, anything can become a part of further networks 
and be used to circulate information [8]. In the future we will see that connected things 
can be designed to sense their environment and document information about what is 
happening there. Information will be given real-time or it can be documented and 
inspected later. 
The ?eld of application for the Internet of Things solutions is increasingly 
extending to virtually all areas of every day. The most prominent areas of application 
include the smart industry, where the development of intelligent production systems 
and connected production sites is often discussed under the heading of Industry 4.0. In 
the smart home, smart building is receiving a lot of attention. Again, smart toys’ 
applications focus on Internet of Toys possibilities. Internet of Toys is quite a new sub-area 
of the Internet of Things. At its core is the combination of physical and digital 
components to create new products and enable novel business models for toy and game 
businesses. Consequently, a range of opportunities is unfolding for toy and game 
companies to generate incremental value through the Internet of Toys, also considered 
‘smart toys’. 
2.2 The Internet of Toys (IoToys) 
Satyanarayanan’s (2001) vision of ‘pervasive computing’ consider invisibility in use, 
where a pervasive computing environment as one saturated with computing and 
communication capability, yet so gracefully integrated with users that it becomes a 
“technology that disappears” [9]. The Internet of Toys represent such a technology. 
The Internet of Toys have an Internet connection inside of the toys or they connect 
online through mobile phones, which means that technology has become pervasive. 
Hung et al. (2017) de?ned a smart toy “as a mobile device consisting of a physical toy 
component that connects to one or more toy computing services to facilitate gameplay 
in the Cloud through networking and sensory technology to enhance the functionality 
of a traditional toy” [10]. The goal of our study is to understand the ‘edutaining’ 
potentiality of the Internet of Toys, (or IoToys), which in the context of children’s 
The Internet of Toys, Connectedness and Character-Based Play 1081
everyday play often come to mean play experiences especially related toys connected 
to the Internet. In this study, we use the concept of the IoToys (Wang et al. 2010) in 
reference to early education to map the potentialities these smart and connected toys 
hold when considering toy-based learning opportunities [11]. Holloway & Green 
(2016) have de?ned smart toys as the Internet Toys, which (1) are connected to online 
platforms through Wi-Fi and Bluetooth but can also be connected to other toys, (2) are 
equipped with sensors, and (3) related one-on-one to children [12]. According to our 
literature review, earlier studies with smart toys introduce how the concept of IoToys 
has evolved and how these toys are being played with. For example, Frei et al. 
(2000) have studied Curlybot smart toys by conducting an informal user study with 81 
children. Curlybot is a two-wheeled vehicle that measures, records and plays back its 
exact movement on any flat surface. The Curlybot is a self-contained smart toy 
requiring no external computers, which has a microprocessor to control not only move 
forward and backward but also rotate freely, the smart toy includes a memory chip. The 
child records the movements of Curlybot by pressing a button that lights up a red or 
green indicator. The study showed that children ages four and above playing with 
Curlybot engaged in computational and mathematical concepts in a more creative way 
[13]. Piper and Ishii (2002) have researched elementary school students playing with 
Pegblocks, which is an educational toy showing basic physics principles. Children 
manipulate wooden toys connected to each other via electrical cables to observe kinetic 
energy changes. Based on the smart toy categorization, Pegblocks is a self-contained 
smart toy that initiates cognitive tasks such as understanding kinetic energy changes. 
Peglocks is a set of ?ve wooden blocks. Each block consists of nine pegs combined 
with electric motors, converting the kinetic energy of the child’s hand into electrical 
energy [14]. Vaucelle and Jehan’s study (2002) explores Dolltalk, a computational toy 
that records children’s gestures and speech and plays back their voices. Dolltalk is a 
self-contained smart toy that initiates cognitive tasks, speci?cally, linguistic expres-sions 
and storytelling. Dolltalk includes a platform with tag sensors, two speakers, one 
microphone and two stuffed animals with sensors. When the child removes the two 
stuffed animals from the platform, recording begins. Vauchelle and Jehan conducted a 
user study with 12 children at an elementary school and concluded that children 
generally enjoyed their interaction with Dolltalk by frequently repeating the playback 
[15]. Fontijn and Mendl’s (2005) StoryToy is an environment featuring stuffed farm 
animals that tell stories and react to each other. StoryToy is a self-contained smart toy, 
which initiates the cognitive task of storytelling. The StoryToy plush character has a 
motion sensor connected to a wireless transmitter that advances play. StoryToy pro-poses 
three modes free play, reactive play and story play, which based on the location 
of the duck character. All sensor events are uploaded to the computer via a receiver and 
translated into audio responses by Java after responses are sent through a wireless 
speaker. The researchers from the Philips Research Company and Eindhoven 
University of Technology conducted their study with children between 2 and 6 years of 
age. The research presents results that older children (4–6 years) considered more 
complex dialogues enjoyable, but it was hard to follow dialogues of younger children 
(2–3 years) [16]. Merrill et al.’s (2007) research included Sifteo, a self-contained smart 
toy which allows children to interact with electronic blocks to produce different 
knowledge combinations. Children select electronic blocks in accordance with their 
1082 P. Ihamäki and K. Heljakka
desires and create their own patterns, it initiates cognitive tasks through thinking, 
imagination and knowledge creation. Sifteo has mainly ?ve components, namely, color 
LCD screen, accelerometer, infrared transceivers, rechargeable battery, and RF radio. 
A user’s physical manipulations are sensed and considered as input to the system. 
Visual feedback is displayed on the LCD screen [17]. What is lacking from these 
explorations is the considerations regarding the ‘characters’ of the toys, or their per-sonalities 
as playthings. In our study we have chosen to inspect three IoToys which 
based on gender-neutrality as character type of toys (toys with faces) and their avail-ability 
on Amazon US (in August 2017): (1) CogniToy Dino, (2) Wonder Workshop’s 
Dash Robot and (3) Fisher-Price’s Smart Toy Bear. These toys ful?ll the criteria of the 
IoToys. They are “smart” and their connectivity usually occurs through mobile devices 
(smartphones and tablets). In some cases, smart toys also contain their own computers 
(e.g., The CogniToy Dino and Fisher-Price’s Smart Toy Bear). Cagiltay et al. 
(2013) describe that characteristics of smart toys need to be analyzed in order to 
develop effective smart toys learning environments, which is the environment where 
smart toys expected to have educational values [18]. For example, Fisher-Price’s Smart 
Toy Bear includes nine different smart cards, which depict both educational and 
entertaining activities. The IoToys allows children to learn through everyday experi-ences. 
In our case study, we add the concept of guided play to discuss the use of IoToys 
in early education. In reference to our case study we also present the preschool-aged 
children’s Playful Experiences Framework for playing with the IoToys. 
3 Toy-Based Companionship 
The nature of the concept of “toys” has changed considerably over the last decades. 
The toy industry has for a long time attempted to combine traditional toys and physical 
games with digital devices in ‘smart’ ways, but only a few successful of this hybridity 
has been seen on the market. A branch of this development, the IoToys, are still in their 
infancy. 
The IoToys bring with them the question of their possible advantages of use in 
education as they integrate multimedia material in a way impossible for ‘traditional’ 
toys. Recently, many children have used computer-mediated toys [19]. Early childhood 
curriculum provided opportunities for children to play and interact with toys and same 
time create companionship with toys [20]. Children learn best when they are active 
participants, when they are engaged, when the information is meaningful and when an 
activity is socially interactive [21]. The IoToys supports guided play, where guided 
plays refer to learning experiences that combine the child-directed nature of free play 
with a focus on learning outcomes and adult mentorship. That makes children engaged, 
but with the advantage of focusing the child on the dimensions of interest for a learning 
objective. The companionship towards new technologies like the IoToys when applied 
in learning environments brings some bene?ts for children [22]. For example, it 
enhances the educational value of children’s play [23] and enables physical objects to 
be seamlessly connected to digital content [20]. Combining physical and digital worlds 
such as the physicality of traditional toys and interactive connected smart toys is 
potentially bene?cial for children. In particular, because of the emergence of IoToys as 
The Internet of Toys, Connectedness and Character-Based Play 1083
a category of playthings, early childhood educators should be aware of these toys in the 
context of early learning and their educational potential with a focus on the social 
aspect of play. 
3.1 Toyi?cation of Technologically-Enhanced Edutainment 
Noxon (2006) states that “toyi?cation” describes “how everyday adult stuff is getting 
less utilitarian and more toy-like” [24]. According to our belief, toyi?cation has taken a 
strong hold in current product development and marketing. Following this trend, some 
companies have been working to make their products more toy-like to appeal to people 
who might be feeling overwhelmed otherwise [25]. More speci?cally, toyi?cation 
communicates the idea of an entity (either physical, digital or hybrid) being inten-tionally 
reinforced with ‘toyish’ elements or dimensions; an object, a structure, an 
application, a character or a technology acquiring a toyish appearance, form or func-tion. 
In parallel to the gami?cation of everyday life, it is, in this way possible to trace 
simultaneously occurring patterns of toyi?cation taking place in different realms of 
culture [26, 27]. Whereas games (both physical and digital) have, for a long time, been 
considered a suf?cient educational media and recent developments have demonstrated 
a gami?cation tendency in the realm of education, we believe that, paralleling this 
development, it is possible to see an emerging toyi?cation of education, especially 
when considering the IoToys as a part of the global phenomenon of the IoT. We see 
that toyi?cation of technologically-enhanced play is becoming more emergent in 
education and edutainment as education is turning more informal through for example 
gami?cation. Toyi?cation brings with it possibilities to formal education in connection 
with guided play. We describe in our case study how the IoToys related playthings are 
in other words shedding hard technology, for example, the Fisher-Price Smart Toy 
Bear. In the Smart Toy Bear the ‘brains’ are a computer without a screen. We 
understand that in this case, toyi?cation of technology has improved smart toys in 
combination with the connectivity of the IoToys, which are now providing new ways to 
entertain and ‘edutain’ the children playing. Moreover, they also bring with them 
playful experiences into children’s use of cutting-edge technologies, as discussed 
further. Edutainment represents an informal type of education form of education that 
has been successfully used by many education systems around the world. Edutainment 
by purpose and content consists of informal education which is to improve learners’ life 
control and skills education which is, for example, to offer experiences like playing 
with connected toys [28]. The term edutainment is de?ned in several ways. The 
American Heritage Dictionary de?nes edutainment as “the act of learning through a 
medium that both educated and entertains”. According to Buckingham and Scanlon 
[29] edutainment is a “hybrid genre that relies heavily on visual material, on narrative 
or game-like formats computer games-education-implications for game developers, and 
on more informal, less didactic style of address”. Computer edutainment includes game 
types: adventure, quiz, role-play, simulations and experimental drama; edutainment on 
the Internet included tele-learning systems, web-based educational systems and inter-active 
smart connected toys (IoToys). This type of edutainment uses the interactivity 
via software and hardware and connects with other telecommunications systems [28]. 
1084 P. Ihamäki and K. Heljakka
3.2 Toy-Based Learning in a Connected World 
Toy-based learning in the digitalizing age presents us with questions concerning 
connectivity. Connected toys incorporate Internet technologies that respond to and 
interact with children. A toy-based learning environment can provide physical inter-action 
between the toy and the playing children. Lampe and Hinske (2007) pointed to 
that ideal learning experience comes from the combination of physical experience, 
digital content, and imagination of the child [30]. In addition, learners can use the toys 
abilities according to educational aims [19]. Some researchers have seen a potential for 
toys to use them an education. For example, Demir and Sahin (2014) studied the 
scienti?c toys used to teach physics, chemistry and biology concepts. They evaluated 
the toys according to scienti?c creativity [31]. Kara et al. developed smart toys for 
storytelling activity and examined storytelling skills, creativity, and narrative activities. 
Both studies showed positive effects of toys and their depended variables [32, 33]. 
There are few studies on how children play with these smart toys [34, 35]. A previous 
study shows how these smart toys facilitate a child’s social skills [23]. 
Technology-based toys are increasingly popular with today’s children [36]. In 
earlier research on smart toys, authors identi?ed some of the unique features that 
connected with different developmental stages. Connected toys can contribute to 
blurring the boundaries between formal and informal learning [37]. Children’s input 
(data) can be analyzed and responded to in increasingly individualized ways. This 
individualization, therefore, has the potential to offer signi?cant educational bene?ts 
and is at the center of major changes in existing learning technologies. These tech-nologies 
can give children “choice in the pace, place, and mode of their learning” [38]. 
Today’s game-playing children learn to do things to take in information from many 
sources and make decisions quickly, to deduce a game’s rules from playing rather than 
by being told, to create strategies for overcoming obstacles, to understand complex 
systems through experimentation. First of all, increasingly they learn to collaborate 
with others, as digital games are about playing with networks [39]. 
Early studies suggest that the used technologies in early childhood education could 
be addressed by developing new ideas about children’s digital play that helps educators 
to recognize children’s activity with technologies in a play-based way [40, 41]. This is 
because early childhood education is traditionally play-based, and educators are used to 
observing and assessing young children’s play. Toy-based learning, contrary to the 
often structured, rule-bound, and competitive (and potentially more acknowledged) 
game-based learning, seems to build more on an open-ended, imaginative but still 
educational realm, especially ?t for young learners such as children of preschool age. 
Children play with their IoToys and potentially build an imagined world with them. 
In this theoretical frame, a socio-constructivist view is adopted, according to which 
learning is not an individual, but particularly social and societal activity that means that 
means that learning always takes place in a social context. Under such a framework of 
toy-based learning the use of the educational features of the IoToys contributes to the 
realization of: (1) Meaningful learning, based on preschool age children’s own group 
work with educational materials (in our case for drawing a picture of their chosen 
IoToys plaything); (2) authentic learning using learning resources of real-life or sim-ulations 
of the everyday phenomena (in our case study the Fisher-Price Smart Toy Bear 
The Internet of Toys, Connectedness and Character-Based Play 1085
which for example has smart cards that remind the player to “brush his/her teeth”); 
(3) social learning: technology supports the process of joint knowledge development, 
connected with toys, IoToys can support collaboration between fellow preschool-aged 
children, who can be based at different schools or abroad; (4) active-reflective learning: 
preschool-age children’s playing may result in problem-solving using available 
resources selectively according to their interest, search and learning strategies; 
(5) problem-based learning: a method that challenges preschool-aged children to 
“learning by doing”, preschool-aged children’s group are seeking solutions to real 
world problems, which are presented on the following toy-based learning diagram used 
to engage children’s curiosity and initiate motivation to learning. 
4 Study 
In our case study, we see play value, which can use to describe the overall enjoyment of 
a child with a certain toy. It consists of factors such as complexity and challenge, 
appropriateness for the context (in this case study kindergarten with preschool aged 
children), correspondence to the character of the child. We have developed a toy-based 
learning diagram (Fig. 1) based on The Experiential Learning Cycle developed by 
Kolb [42]. We have used our three IoToys playthings, which are Fisher-Price Smart 
Bear, CogniToys Dino, and Wonder Workshop’s Dash. The four corners of the dia-gram 
represent four experiential learning styles: when a child takes a toy to interact and 
play with. This diagram can be used as a toy-based learning tool by placing an idea for 
a toy and children interactive style and the envisioned uses of the IoToys within the 
diagram and then thinking of how the toy could be adapted to facilitate a different 
experiential learning cycle with it. This adapted toy is placed in the diagram and can 
function as a starting point for a next adaptation, the initial idea being changed again 
and again to cater for various play types with the learning different subjects [43]. Our 
study employs toys that according to their marketers cater to enjoyment and oppor-tunities 
for learning. In this way, the toys under scrutiny represent “edutainment”, 
although their educational promises are often accentuated over the play value of their 
traditional play patterns [35]. 
4.1 Research Data 
The research design of our study consists of children’s playtests, marketer’s presen-tations 
of the IoToys features and educational potentialities, and a preschool teacher 
survey. We have investigated children’s own responses to these technologically 
enhanced toys. We have conducted two group interviews and interactive playtests with 
20 preschool-aged children. 
All of the toys chosen for this study represent different character toys, meaning that 
they carry a resemblance to know animal (CogniToys Dino, Fisher-Price’s Smart Toy 
Bear) or familiar robot forms (as in the case of Wonder Workshop Dash). All of the 
IoToys use English language and respond to light, sound and/or movement). One of the 
reasons for this selection of IoToys was based on our interest in mapping out their 
capacity to invite their user to experiences playful experiences. What guided our interest, 
in particular, are the toy industry’s promises related to smart toys’ activities [35]. 
1086 P. Ihamäki and K. Heljakka
Both empirical inquiries include questions concerning the toys’ educational 
potential. Our methodology includes participatory observation, playtests, and written 
and visual types of documentation through photographing and videotaping the test 
groups playing, learning, and interacting with our IoToys, including the children 
drawing their chosen IoToys after the playtests. The multimethod approach allows us to 
carry out both a narrative and visual analysis of data [35]. 
Group Interviews and Play Tests 
We have conducted two group interviews and interactive play tests with 20 preschool-aged 
children (5–6 years of age) in a Finnish group and a Finnish/English speaking 
bilingual group in a West-coast Finnish town in October 2017. Finnish children are 
introduced early to mobile technologies and many even have their own mobile phones 
and tablets before starting school (typically at age 7). We were informed that the 
children in the Finnish group each have their personal tablet at preschool, which they 
are allowed to use in supervision for a limited time per day. In order to understand the 
children’s exposure to mobile technologies, we also asked their kindergarten teachers 
how many of them have a mobile phone of their own. Of the children that participated 
in our study, 10 reported owning a mobile phone of their own. This question was 
relevant in developing an understanding of whether or not it is possible for the children 
to, for example, use the mobile phone to operate an app, photograph, or video-record 
their toy play by themselves [35]. 
Fig. 1. Toy-based learning diagram. 
The Internet of Toys, Connectedness and Character-Based Play 1087
4.2 Educational Promises of the Internet of the Toys in Our Study: The 
Marketers’ Perspective 
The IoToys included in our study [35] are briefly described in the following from the 
perspective of their marketers: 
CogniToys Dino, A “Personalized Learning Buddy” 
Amazon.com describes the CogniToys (by Elemental Path) as an educational toy that 
includes stories, games, jokes, and fun facts, encompassing subjects including 
vocabulary, math, geography, science, and more to engage “your child in educational 
play based on their academic needs”. The age recommendation given for CogniToys by 
the manufacturer is ?ve years and older. The CogniToys Dino will constantly evolve, 
with its cloud-connected, Wi-Fi enabled character allowing for the play experience to 
constantly improve and update automatically as new content becomes available. The 
toy is said to engage kids with a wide variety of content by encouraging learning and 
play using interactive dialogue. In practice, the CogniToys Dino grows with the 
children by listening to their questions and adapting to their personal preferences and 
unique educational skill set. The toy explores favorite colors, animals, and more to 
customize engagement as well. The educational promises of the CogniToys are that 
learning is a “FUNdemental Part of the CognitToys Experience” and that each Dino 
comes with “a variety of custom modules to engage kids in educational play including 
problem-solving challenges, geography games, historical fun facts and more”. Once the 
Dino is con?gured using the CogniToys App, it presents age-appropriate content from 
the ?rst “Hello!” [44]. CogniToys represents a new wave of toy design, having its 
origins in a Kickstarter campaign [45]. According to a marketing text published by 
Toys “R” Us, the CogniToys Dino represents the “next generation of internet-connected 
smart toys” and “can engage your child in conversations, play games, sto-rytelling 
and more” [46]. 
Wonder Workshop’s Dash 
The product description given by the company behind Wonder Workshop’s Dash 
claims the toy “is a real robot that makes learning to code fun for kids”. “Responding to 
voice, navigating objects, dancing, and singing, Dash is the robot your child always 
dreamed of having. Use the free Apple, Android, and Kindle Fire apps to create new 
behaviors for Dash—doing more with robotics than ever before possible. Dash presents 
your kids with hundreds of projects, challenges, and puzzles as well as endless pos-sibilities 
for freeform play. Along with Dash, you can use our ?ve free mobile apps. 
The Wonder and Blockly apps are designed for every child to have fun on their own 
while learning how to program robots” [47]. The manufacturer recommends this toy for 
children ages six and up. Wonder Dash Robot has multiple apps, one of which is the 
Blockly App, which is in standard use in elementary schools and recommended for 
kids by Code.org. With Blockly, “your child or student can take on coding challenges 
and make their own programs for Dash. […] you can create your own dance, record 
your choice, and have Dash play it back, or even program Dash to follow you around. 
With the new tutorial section, it is possible to program with no previous experience”. 
While the company sells its programmable robots directly to families, it has also seen 
Dash and Dot becoming part of schools’ curricula and coding clubs over the years. 
1088 P. Ihamäki and K. Heljakka
According to Kolodny, some 8,500 schools are using Dash and Dot around the world 
today [48]. According to a marketing text published by Toys “R” Us, Wonder 
Workshop’s Dash “is a real robot, responsive to the world, on the go and at the ready. 
Kids imagine the sidekick, pet, or pal they’ve always wanted and brought it to life with 
Dash and their own code. […] Dash is a faithful explorer in the world your child 
creates. Dash can greet kids as they come home from school, help them deliver a 
message to a friend, follow them on journeys, become a true partner in fun” [49]. 
Fisher-Price Smart Toy Bear 
According to a product description given by Fisher-Price, the “Smart Toy is the next 
generation of play”. Manufacturer recommends the Smart Toy for ages three to eight 
years. Smart Toy Bear is an interactive learning friend with all the brains of a computer, 
without the screen. “The more your child plays with Smart Toy, the more this 
remarkable furry friend adapts to create personalized adventures. Fisher-Price Smart 
Toy Bear can start a true friendship with a child and that will help your child grow 
socially and emotionally, too” [50]. 
“Smart Toy is an interactive learning friend with all the brains of a computer, 
without the screen. When children talk, their furry friend listens and adapts to future 
conversations. Smart Toy actually recognizes their voice. The toy also recognizes his 
Smart Cards (each Smart Toy comes with nine Smart Cards and a cute little backpack 
to store them in). Smart Toy knows what your child wants to do: make up a story, play 
a game, go on an adventure and more. The Smart Toy encourages social-emotional 
development, imagination, and creativity” [50]. 
According to a marketing text published by Toys“R”Us, the Fisher-Price Smart 
Toy Bear’s features encompass the following: “The toy includes Voice Recognition: 
Talks and listens and remembers what your child says—the two of them can have 
actual conversations! Image Recognition: Visually recognizes the nine Smart Cards 
included so your child can choose activities like stories, games, and adventures! Smart 
Card expansion packs are available “to expand the play”. The toy “Learns your child’s 
favorite things and activities!” Knows when you toss him in the air (with a little help 
from his accelerometer). Knows the time of day, weather, and world events. Plays 
games with the whole family make up stories where your child can choose what 
happens next! Takes your child on imaginative adventures. Tells jokes. By down-loading 
a free app at smarttoy.com/app, unlimited Wi-Fi content updates may be 
unlocked and help the Smart Toy “learn your child’s name”. Also, “Parents can unlock 
bonus activities with the app, such as bedtime, clean-up, break time, and party time!” 
The marketing text attached to the toy promises that “No personally identi?able data is 
transmitted by Smart Toy” [51]. 
5 Results and Discussion 
Toys have been a constant presence in children’s lives for centuries. For children, 
however, toys are tools for encouraging different kinds of play. Our case study shows 
that children use play to disentangle ambiguities they ?nd in the world and to playtest 
their incident hypotheses about how things work. For example, when preschoolers are 
The Internet of Toys, Connectedness and Character-Based Play 1089
offered the IoToys to play with that has an ambiguous causal mechanism, the ?rst thing 
they do, without being told, is ?gure out how the toys work through exploratory play. 
However, the IoToys represent multimedia toys, and ?rst need adult mentor-ship. 
Guided play crucially incorporates an element of adult structuring of the play 
environment (toy-based learning environment), but the child maintains control within 
that environment. 
In the two group interview sessions, the researchers introduced all three IoToys to 
the children one by one, ?rst by showing the toy and then letting each child interact 
with it. Finally, we showed the children a short video of the toys’ functions based on 
non-commercial material (review videos) found on YouTube. During the child-toy 
interaction, the group was asked the three following questions: (1) what the toy could 
teach them, (2) how the child would play with the toy alone, and (3) how the child 
would play with the toy in the company of other children. The children’s answers from 
the group interviews to these questions are collected in Table 1. After the playtest, all 
children had to ?gure out how the IoToys work. Acting on the IoToys to discover how 
they work thus led to better learning compared to playing with these toys merely to 
con?rm what has been shown. 
In our research, we have found out that the IoToys need to be explored through 
various play forms, which are pretend play, object play, physical play and guided play. 
In this case study the toys’ capacity to invite their players to pretend play and creative 
play and, in this way, their potential play value in terms of open-ended play (and 
intrinsically motivated play), when contrasted to their educational value (instrumentally 
motivated play) seems in balance as all these toys afford all forms of play. In this case 
study pretend play showed to be a relevant form of play with preschool children as the 
toys included in our study all represent characters, whose personalities may be 
developed further in for example imaginative play. In the study, the investigation of 
object play revealed everyday materials like tablets are integrated in play, for example 
for the Dash robot to act as a remote control. On the other hand, mundane material (like 
papers and pencils) can be used to make ‘rails’ for the Dash robot to follow. In this case 
study physical play happens for example with CogniToy, which played music and the 
children in our study started dancing in the middle of the playtest. Same happened, 
when we introduced the Dash robot. Because of its movement and sound, children 
started to follow the robot. Children’s interaction with each other resulted in collab-orative 
learning, for learning with others directs how to act and what to do next. 
Finally, we also analyzed marketing materials in connection with the IoToys, and 
children’s own playful experiences. Following the Playful Experience (PLEX) model 
introduced by Lucero et al. [52], we built on the suggested framework to understand the 
playful dimensions related to play with the IoToys. Our Playful Experience Framework 
of the IoToys validation efforts included a study of everyday gadget use, such as 
playing with the IoToys and playing with mobile phones and tablets, to see what 
experiences those devices or the IoToys prompted in use (in this case by preschool-aged 
children). As a result, 15 categories were included in the Playful Experience of the 
IoToys Framework (see Table 2). 
Based on our ?ndings presented in the previous section, we have understood and 
discussed that in the educational context of most importance is that children are guided 
in their play with the IoToys. Also as in kindergarten teacher need to have some earlier 
1090 P. Ihamäki and K. Heljakka
experiences with IoToys in the context of education so that they understand their 
potential and educational values. Kindergarten teachers need to have goals in what they 
want to teach the children with the help of these toys and which of their affordances are 
considered as entertaining. Moreover, what is needed is a discussion on how important 
the subtypes of play are for children. For example, pretend play supports children’s 
own creativity. All play is somehow pretend of play, when children create own stories 
to play with character-types of toys, which they want identify and make friends with. 
Object play patterns entail that children use everyday materials such as tablets and 
phones to play with. In fact, children see these digital devices as a toys and don’t 
necessarily make a differences between digital and physical things. We believe that 
technology in toys will be even more invisible for the future. Physical play with IoToys 
also gives the possibility to make children to move and be active. As we have seen in 
our case study, dancing is one of the physical play patterns encouraging movement, but 
also the playing of the hide and seek game, in which children followed the Dash Robot. 
Table 1. Children’s responses to three Internet of Toys’ character toys in our case study 
Questions CogniToys Dino Wonder workshop’s 
dash 
Smart toy 
bear 
What the toy teaches the child 
(educational play patterns) 
* How to make 
different sounds 
* How to make 
different sounds 
(e.g., farm animals) 
* English 
language 
* How to sing * Tells 
stories 
* Music * How to 
play tag 
How the child would play with 
the toy alone (solitary play 
patterns) 
* Dance * Play tag * Nurse 
* Sing with the the toy 
toy 
* Play hide and seek 
* Play house * Play 
hide and 
seek 
* Play disco 
with it 
* Use it in play 
in which you 
need music 
* Use it as a 
lamp 
* Take videos 
with it 
* Nurse it 
How the child would play with 
the toy with other children (social 
play patterns) 
N/A * Play disco dancing * Play 
school 
with the 
toy 
* Play football 
* Make arts & crafts 
* Share 
the toy 
* Play 
house 
The Internet of Toys, Connectedness and Character-Based Play 1091
Table 2. The Playful Experience of the Internet of Toys Framework, with 15 categories 
Playful 
experience 
Description IoToys in our study 
Challenge Children’s abilities are tested by the 
IoToys’ demanding tasks 
Wonder Workshop Dash 
Competition Children can contest their earlier 
experiences with IoToys 
Wonder Workshop Dash 
Completion Finishing a major task, like listening to 
the IoToys’ story 
Fisher-Price Smart Toy, 
CogniToy Dino 
Control Commanding IoToys with an Ipad Wonder Workshop Dash, 
Fisher-Price Smart Toy, 
CogniToy Dino 
Discovery Children’s imaginative play with IoToys 
presents what the designer may not even 
thought, e.g. using the IoToy as a lamp 
Wonder Workshop Dash, 
Cognitoy Dino 
Exploration Investigating an object or situation with 
the IoToys 
Wonder Workshop Dash, 
Fisher-Price Smart Toy Bear, 
CogniToy Dino 
Expression Children play creatively e.g. by coding Wonder Workshop Dash 
Fantasy An imagined experience, e.g. the “IoToys 
can teach me to fly” 
Fisher-Price Smart Toy, 
CogniToy Dino 
Fellowship IoToys like Dash Robot has their own 
community to share an experience of their 
own toy with others 
Wonder Workshop Dash 
Humor IoToys give children fun and joyous 
experiences e.g. by telling children stories 
and jokes 
Fisher-Price Smart Toy, 
CogniToy Dino 
Nurture Children want to take care of their IoToys Fisher-Price Smart Toy 
Relaxation Children comment that the IoToy can 
read them a bedtime story 
Fisher-Price Smart Toy, 
CogniToy Dino 
Sensation Children think that IoToys are exciting 
for stimulating senses and giving children 
feedback 
Fisher-Price Smart Toy, 
CogniToy Dino 
Sympathy Children can share emotional states with 
their IoToys 
Fisher-Price Smart Toy, 
CogniToy Dino 
Thrill Children’s excitement derives from to 
taking risks with the IoToys, e.g. by 
listening to a ghost story told by the toy 
or risk-taking in reference to learning 
coding with the IoToys 
Fisher-Price Smart Toy, 
CogniToy Dino 
1092 P. Ihamäki and K. Heljakka
6 Conclusions 
Today’s Digital Revolution has, alongside the Internet of Things, introduced the 
IoToys. One thing is almost certainly assured, toys will always play a role in facilitating 
children’s play. So will the technologically enhanced toys as demonstrated by our 
study. These IoToys, when used in combination with guided play activities, can give 
children rich learning opportunities. In our case study, we present the Toy-based 
Learning diagram, which with the help of the IoToys identi?ed some of the unique 
features that connected experiential learning style. The learning environment can be in 
kindergarten, school or home environment, and learning may happen when children are 
active, engaged, learning the meaningful material, and in a social context (which can be 
also connected toys with other toys, or other children globally). On the other hand, the 
character-type the IoToys employed in our research show that the imaginative play 
patterns, such as treating the toy character as a companion that may be nurtured and 
played with without light, sound, or movements, often overshadow its educational 
potential grounded in the pre-programmed content that guides the child in learning how 
to carry out, for example, language-based or mathematical activities. This means that 
recognizing children’s actual play activities with the IoToys in play-based situations 
would provide educators with useful knowledge on the toys’ capacity to invite ‘hybrid’ 
play patterns beyond digital play. 
Finally, we reflect on the limitation of this study. The limitations that must be 
considered are a) the scarcity of earlier literature on the IoToys used in education, and 
b) the study environment, which in our study was a Finnish preschool environment (for 
n = 20, 5–6-year-old children) in combination with our use of social group interviews 
and play tests rather than individual interviews. Our study was conducted with a 
relatively small sample size (20 children with Finnish kindergartens), with limited 
demographic diversity. Despite these limitations, we believe that this work represents 
an important ?rst inquiry on children’s interactions with the chosen IoToys, particularly 
in an early-education context as a new tool for observing and assessing young chil-dren’s 
toy-based learning. We hope that it will inspire ongoing and future work in 
IoToys, especially with an interest in children’s experience in early education. 
In upcoming work we will continue our research with Finnish kindergartens by 
collecting long-term data. Early childhood education is traditionally play-based: 
educators are used to observing and assessing young children’s play. Valuable ideas of 
how to incorporate IoToys into early education curricula could be found out by 
including questions regarding companionship besides connectivity into the inquiries on 
the toys’ capacity to function as edutainment - and by listening to the children them-selves 
in order to understand their play with these technologically-enhanced toys even 
better. Furthermore, in the following stages of research we will explore more thor-oughly 
the experiences and ideas of kindergarten teachers to explore how 
technologically-enhanced, character-based play with the IoToys could best be facili-tated 
to support learning in early education. 
The Internet of Toys, Connectedness and Character-Based Play 1093
References 
1. Lindon, J.: What is Play. National Children’s Bureau, London (2002) 
2. Wasserman, S.: Serious Players in the Primary Classroom. Teacher College Press, New York 
(1990) 
3. Hassiger-Das, B., Zosh, J., Hirs-Pasek, K., Golinkoff, R.: Toys. In: Peppler, K. (ed.) 
The SAGE Encyclopedia of Out-of-School Learning. Sage Publication Inc., Thousand Oaks 
(2017) 
4. Weisberg, D., Hirsh-Pasek, K., Golinkoff, R., Kittredge, A., Klahr, D.: Guided play: 
principles and practices. Curr. Dir. Psychol. Sci. 25(3), 177–182 (2016) 
5. Kshetri, N.: The economics of the Internet of Things in the Global South. Third World Q. 38 
(2), 311–339 (2017) 
6. Gartner Inc., Gartner says 8.4. billion connected “things” will be in use in 2017, up 31 
percent from 2016. http://www.gartner.com/newsroom/id/359891,accessed. Accessed 27 
Feb 2018 
7. European Commission, Advancing the Internet of Things in Europe. http://eur-lex.europa. 
eu/legal-content/EN/TXT/?uri=CELEX:52016SC0II0. Accessed 27 Feb 2018 
8. Bunch, M., Meikle, G.: The Internet of Things. Polity Press, Cambridge (2018) 
9. Satyanarayanan, M.: Pervasive computing: visions and challenges. IEEE Pers. Commun. 8 
(4), 10–17 (2001) 
10. Hung, P.C.K., Fantinato, M., Rafferly, F., Iqbal, S.-Y., Huang, S.-C.: Towards a privacy rule 
model smart toys. In the IEEE 50th Hawaii International Conference on System Sciences 
(HICSS-50), 4–7 January, Big Island, Hawaii, USA (2017) 
11. Wang, X.C., Berson, I., Jaruszewicz, C., Hartle, L., Rosen, D.: Young children’s technology 
experiences in multiple contexts: Bronfenbrenner’s ecological theory reconsidered. In: 
Berson, I., Berson, M. (eds.) High-tech Toys: Childhood in a Digital World, Denver, USA, 
pp. 23–47. Information Age Publishing Inc. (2010) 
12. Holloway, D., Green, L.: The Internet of Toys. Commun. Res. Pract. 2(4), 506–519 (2016) 
13. Frei, P., Su, V., Mickhak, B., Ishii, H.: Curlybot: designing a new class of computing 
systems. In: Proceedings of the SIGGHI Conference on Human Factors in Computing 
Systems, pp. 129–136. ACM Press, New York (2000) 
14. Piper, B., Ishii, H.: PegBloks: a learning aid for the elementary classroom. In: Terveen, L., 
Wixon, D. (eds.) CHI ’02 Extended Abstract on Human Factors in Computing Systems, 
pp. 686–687. ACM Press, Minneapolis (2002) 
15. Vaucelle, C., Jehan, T.: Dolltalk: a computational toy to enhance children’s creativity. In: 
CHI 2002, Extended Abstracts on Human factors in Computing Systems, pp. 776–777. 
ACM Press, New York (2002) 
16. Fontijin, W., Mendel, P.: StoryToy: the interactive storytelling toy. In: Gellersen, H.W., 
Want, R., Schmidt, A. (eds.) Third International Conference, Pervasive 2005. Proceedings 
Series: Lecture notes in Computer Science, 3468, pp. 37–42, Munich, Germany (2005) 
17. Merrill, D., Kalanithi, J., Maes, P.: Siftables: towards sensor network user interfaces. In: 
Proceedings of First International Conference on Tangible and Embedded Interaction, 15–17 
February, Louisiana, USA, pp. 75–78 (2007) 
18. Cagitay, K., Kara, N., Aydin, C.C.: Smart toy based learning. In: Spector, J.M., Merrill, M., 
Elen, J., Bishop, M. (eds.) Handbook of Research on Educational Communication and 
Technology, pp. 703–711. Springer, New York (2013) 
19. Kara, N., Aydin, C.C., Cagiltay, K.: Design and development of a smart storytelling toy. 
Interact. Learn. Environ. 22(3), 288–297 (2012) 
20. Yelland, N.: Technology as play. Early Childhood Educ. J. 26(4), 217–267 (1999) 
1094 P. Ihamäki and K. Heljakka
21. Hirsh-Pasek, K., Adamson, L.B., Bakeman, R., Owen, M.T., Golinkoff, R.M., Pace, A., 
Suma, K.: The contribution of early communication quality to low-income children’s 
language success. Psychol. Sci. 26, 1071–1083 (2015) 
22. Plowman, L., Stephen, C.: Children, play and computers in pre-school education. Br. 
J. Educ. Technol. 36(2), 145–157 (2005) 
23. Hinske, S., Langheinrich, M., Lampe, M.: Towards guidelines for designing augmented toy 
environments. In: Proceedings of the 7th ACM Conference on Designing Interactive 
Systems, pp. 78–87. ACM, New York (2008) 
24. Noxon, C.: Fisher-Price Fonts. Rejuvenile Consumer Goods: Blog Archive. http://www. 
rejuvenile.com/blog/c/rejuvenile_consumer_goods/. Accessed 11 Feb 2018 
25. Fisher, S.: Forget Gami?cation. Think Toyi?cation., Simplicity 2.0, IT Trends and Though 
Leadership. https://www.laser?che.com/simplicity/forget-gami?cation-think-toyi?cation/. 
Accessed 11 Feb 2018 
26. Heljakka, K.: Playing with words and toying with vocabulary: seizing new meanings related 
to the things for play. In: 7th ITRA World Congress: Toys as Language and Communication, 
23–25 July, Braga, Book of Abstracts. Faculty of Philosophy, Catholic University of 
Portugal, Braga (2014) 
27. Heljakka, K.: Toys as tools for skill-building and creativity in adult life. Int. J. Media, 
Technol. Lifelong Learn. 11(2), 134–148 (2015). Seminar.net 
28. Rapeepisarn, K., Wong, K.W., Fung, C.C., Depickere, A.: Similarities and differences 
between ‘learn through play’ and ‘edutainment’. In: Proceeding of the 3rd Australian 
Conference on Interactive Entertainment, 4–6 December, Perth, Australia, pp. 28–32 (2006) 
29. Buckingham, D., Scanlon, M.: Selling learning: towards a political economy of edutainment. 
Media Cult. Soc. 27(1), 41–58 (2005) 
30. Lampe, M., Hinske, S.: Integrating interactive learning experiences into augmented toy 
environment. In: Proceedings of the Workshop on Pervasive Learning, Toronto, pp. 1–9 
(2007) 
31. Demir, S., Sahin, F.: Assessment of prospective science teachers’ metacognition and 
creativity perceptions and scienti?c toys in terms of scienti?c creativity. Procedia Soc. 
Behav. Sci. 152, 686–691 (2014) 
32. Kara, N., Aydin, C.C., Aydin, C.C.: User study of new smart toy for children’s storytelling. 
Interact. Learn. Environ. 22(5), 551–563 (2012) 
33. Kara, N., Aydin, C.C., Cagiltay, K.: Investigating the activities of children toward a smart 
storytelling toy. J. Educ. Technol. Soc. 16(1), 28–43 (2013) 
34. Johnson, J.E., Christie, J.E.: Play and digital media, computers in the schools. 
Interdiscip. J. Pract., Theory Appl. Res. 26(4), 284–289 (2009) 
35. Ihamäki, P., Heljakka, K.: Smart, skilled and connected in the 21st century: educational 
promises of Internet of Toys (IoToys). In: Hawaii University International Conferences, Art, 
Humanities, Social, Sciences & Education, 3–6 January 2018, Prince Waikiki, Hotel, 
Honolulu, Hawaii (2018) 
36. Cagiltay, K., Kara, N., Aydin, C.C.: Smart toy based learning. In: Spector, J., Merril, M., 
Elen, J., Bishop, M. (eds.) Handbook of Research on Educational Communication and 
Technology, pp. 703–711. Springer, New York (2014) 
37. Montgomery, K.: Children’s media culture in a big data world. J. Child. Media 9(2), 266– 
271 (2015). pp. 268 
38. Gordon, H.R.D.: The history and growth of career and technical education in America, p. 3. 
Waveland Press, Las Vegas (2014) 
39. Prensky, M.: Digital game-based learning. Comput. Entertain. (CIE) 1(1), 1–4 (2003) 
The Internet of Toys, Connectedness and Character-Based Play 1095
40. Edward, S.: Digital play in the early years: a contextual response to the problem of 
integrating digital technologies and play-based learning in the early childhood curriculum. 
Eur. Early Child. Educ. Res. J. 20(2), 199–212 (2013) 
41. Yelland, N.: Reconceptualising play and learning in the lives of young children. Aust. 
J. Early Child. 36(2), 4–12 (2011) 
42. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. 
Prentice Hall, New Jersey (1984) 
43. Gielen, M.: Essential concepts in toy design education: aimlessness, empathy and play value. 
Int. J. Art Technol. 3(1), 4–16 (2010) 
44. CogniToys Dino, Powered by IBM Watson, Kids Cognitive Electronic Learning Toys, 
Amazon. https://www.kickstarter.com/projects/cognitoys/cognitoys-internet-connected-smart-
toys-that-learn. Accessed 8 Aug 2017 
45. CogniToys: Internet-connected Smart Toys that Learn and Grow, Kickstarter. https://www. 
kickstarter.com/projects/cognitoys/cognitoys-internet-connected-smart-toys-that-learn. 
Accessed 8 Aug 2017 
46. Cognitoys Dino Educational Smart Toy Powered by IBM Watson – Green, Toys “R” US. 
https://www.toysrus.com/buy/robotics/cognitoys-7-inch-dinosaur-green-88262-95833696. 
Accessed 8 Aug 2017 
47. Wonder Workshop Dash Robot, IPhone Accessories. https://www.apple.com/shop/product/ 
HJYC2VC/A/wonder-workshop-dash-robot. Accessed 8 Aug 2017 
48. Kolodny, L. Kids can now program Dash and Dot robots through Swift Playgrounds, 
TechCrunc.com. https://techcrunch.com/2016/10/18/kids-can-now-program-dash-and-dot-robots-
through-swift-playgrounds/. Accessed 8 Aug 2017 
49. Wonder Workshop Dash Robot, Toys “R” US. https://www.toysrus.com/buy/robotics/ 
wonder-workshop-dash-robot-da01-96039966. Accessed 8 Aug 2017 
50. Smart Toy Bear, Fisher-Price. http://?sher-price.mattel.com/shop/en-us/fp/smart-toy/smart-toy-
bear-dnv31. Accessed 8 Aug 2017 
51. Fisher-Price Smart Interactive Bear Toy, Toys “R” US. https://www.toysrus.com/product? 
productId=65244526. Accessed 8 Aug 2017 
52. Lucero, A., Holopainen, J., Ollila, E., Suomela, R., Karapanos, E.: The playful experiences 
(PLEX) framework as a guide for expert evaluation. In: DPPI 2013, Praxis and Poetics, 3–5 
September 2013, Newcastle upon Tyne, UK (2013) 
1096 P. Ihamäki and K. Heljakka
Learning Analytics Research: Using 
Meta-Review to Inform Meta-Synthesis 
Xu Du1 , Juan Yang1 , Mingyan Zhang1 , Jui-Long Hung2(&) , 
and Brett E. Shelton2 
1 
National Engineering Research Center for E-Learning, Central China Normal 
University, Wuhan 430079, China 
2 
Boise State University, Boise, ID 83725, USA 
andyhung@boisestate.edu 
Abstract. Research in learning analytics is proliferating as scholars continue to 
?nd better and more engaging ways to consider how data can help inform 
evidence-based decisions for learning and learning environments. With well 
over a thousand articles published in journals and conferences with respect to 
learning analytics, only a handful or articles exist that attempt to synthesize the 
research. Further, a meta-review of those articles reveals a lack of consistency in 
the scope of included studies, the confluence of educational data mining 
activities and “big data” as a parameter for inclusion, and the reporting of actual 
strategies and analytic methods used by the included studies. To ?ll these gaps 
within existing reviews of learning analytics research, this metasynthesis follows 
procedures outlined by Cooper to reveal developments of learning analytics 
research. The results include a number of metrics showing trends and types of 
learning analytic studies through 2017 that include which ?elds are publishing 
and to what extent, what methods and strategies are employed by these studies, 
and what domains remain largely yet unexplored. 
Keywords: Learning analytics
.e
Metasynthesis
.e
Educational data mining 
1 Introduction 
Data-driven decision making, supported by the techniques of data analytics [1], has 
been widely applied in many ?elds such as government management [2], economics 
[3], health care [4] as well as education [5]. Domain names plus “analytics” (such as 
Business Analytics or Health Analytics) have become popular research topics in the era 
of big data. Learning analytics (LA hereafter), involving “the measurement, collection, 
analysis and reporting of data about learners and their contexts, for purposes of 
understanding and optimizing learning and the environments in which it occurs.” 
According to [6], is a term to represent the research area of “education” plus “ana-lytics”. 
It includes the development of technology enriched formats of instructional 
delivery, such as various categories of blended and online learning. These online 
learning environments can be leveraged to track a student’s online behaviors and store 
them in accompanying database systems. When analyzing these stored data streams 
inside existing institutional database systems or data warehouses, the outcomes can be 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1097–1108, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_81
utilized to support all levels of educational decisions. With that in mind, EDUCAUSE’s 
Next Generation learning initiative de?ned LA as “the use of data and models to 
predict student progress and performance and the ability to act on that information” [7]. 
LA research, then, may address the practical use of analytical results for guiding 
institution management, teaching and learning practices and policy making. 
LA and Educational Data Mining (EDM, hereafter) are highly related subjects that 
overlap in de?nition and scope [8, 9]. Although both communities of researchers within 
LA and EDM have similarities where learning science and analytic techniques inter-sect, 
there are some signi?cant differences between them in terms of origins, tech-niques, 
?elds of emphasis and types of discovery [10–12]. EDM refers to computerized 
methods and tools for automatically detecting and extracting meaningful patterns and 
information from large collections of data from educational settings [13]. LA is focused 
on understanding and optimizing learning and learning environments by measuring, 
gathering, analyzing and reporting of data about learners and learning contexts [9]. 
Therefore, the former focuses on how to extract valuable information to automate 
learning or intervention processes while the latter if concerned with how to optimize 
the processes of teaching and learning through data analysis. 
The desire to take advantage of data driven decision making, the ?eld of LA has 
been rapidly growing in popularity. More than 1,000 articles have been published in 
either journals and proceedings from 2011 to 2017, each year returning more than the 
year before. The annual number of publications increased to 200+ since 2015 from only 
28 in 2011. This growth trend over the past 7 years is reflected by the number of review 
articles in LA [14–21]. However, although these review articles conducted systematic 
data collections and article analysis, there are still questions regarding LA research that 
remain unanswered. For example, most of the reviews combined articles relating to both 
EDM and LA which is problematic due to the substantial differences between outcomes 
and goals. In addition, only three articles focused on LA only [14, 18, 21]. 
The review sample size and scope of the reviews are relatively small (the largest 
one only encompassed 135 articles) when compared with the total number of available 
articles on LA. One reason is that most of the reviews included journal articles only in 
their analysis. As an emerging research area, detailed and comprehensive research 
found within conference proceedings may be an important source to reveal the state-of-the-
art progress. In addition, there is no review article which compared the trends 
between or within journal articles and proceedings. 
Existing review articles focus on providing bibliometric analysis. The bibliometric 
results provide an overview of the number of publications, source of journals, proli?c 
researchers, and topical sub-categories within the target domain. Since LA aims to 
support educational decision making, it stands to reason that the research ?ndings 
found within LA literature are just as valuable as bibliometric analysis. Yet, how many 
LA studies reveal unique ?ndings when comparing new analysis against traditional 
methods? How many employ terms like “big data” and what does that term mean 
within the LA literature? Both academia and practitioner bene?t from a comprehensive 
review that address gaps left by the reviews. Therefore, this article aims to provide a 
basis for identifying gaps within existing review articles covering LA, then, use a more 
comprehensive review technique to address (1) which ?elds are publishing and to what 
1098 X. Du et al.
extent, (2) what methods and strategies are employed by these studies, and (3) what 
domains have a large following while others remain largely unexplored. 
2 Literature Review 
Based on our search (major data sources will be listed later), 1,051 articles and pro-ceedings 
were published from 2011 to 2017. Given the intensive research efforts 
focused within LA, it is unsurprising that there are eight review articles published from 
2012 to 2017 (see Table 1) that attempt to synthesize LA research articles. The fol-lowing 
sections focus provide a summary of these review articles and their key 
?ndings. 
In the ?rst review article [14], the authors identi?ed three driving factors-technological, 
educational and political factors-which had greatly driven the develop-ment 
of LA in educational settings. They concluded that LA was primarily focused on 
the higher educational challenge of optimizing “opportunities for online learning” and 
EDM was primarily focused on the technical challenge of extracting “value from big 
sets of learning-related data”. Finally, they suggested that LA should explore the usage 
Table 1. Related review papers regarding learning analytics 
Ref. Database Number of papers analyzed Years 
[14] LAK conference proceedings 70 conference papers 2012 
[15] Web of science and conference 
proceedings 
40 journal papers and conference 
papers in total, the authors didn’t 
include detailed numbers 
2008– 
2013 
[16] Google Scholar 90 journal papers 2010– 
2015 
[17] ACM and IEEE Digital Libraries, 
Scopus, Springer Link and Google 
Scholar 
76 journal papers 2005- 
2015 
[18] Google Scholar, Educational 
Resources Information Center, 
ProQuest, and EBSCO HOST 
112 journal papers 2000- 
2015 
[19] ACM Digital Library, IEEE 
Xplore, Springer Link, Science 
Direct, and Wiley and Google 
Scholar 
55 journal papers 2010– 
2015 
[20] ACM Digital Library, AISEL, 
IEEE Xplore, SpringerLink, 
Science Direct, Wiley, Google 
Scholar and proceedings pf the 
Workshop on ARTEL 
40 journal papers and conference 
papers in total, the authors didn’t 
include detailed numbers 
2010– 
2015 
[21] LAK conference proceedings, 
SpringerLink and web of science 
135 (LAK:65, SpringerLink:37, 
web of science:33) 
2011- 
02/2016 
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1099
of new data source like contextual data and broaden the focus, not only in formal 
settings, but also within informal learning and lifelong learning. 
There are some interesting and unique ?ndings revealed by the remaining seven 
review articles. The authors found that the number of publications experienced sig-ni?cant 
growth across those years [16, 17, 21]. The majority of articles were 
exploratory or experimental studies [15–17, 19, 20]. Virtual Learning Environments or 
Learning Management Systems were the most popular learning settings [15, 20, 21]. 
Two review articles reported that visual data analysis, social network analysis, pre-diction 
and outlier detection were the most commonly found analytic methods [18, 21]. 
The most popular research topics included student behavior modeling and performance 
prediction, the support of students’ and teachers’ reflections, and the awareness and the 
improvement of feedback and assessment services [15, 17, 21]. Another two articles 
examined educational dashboards and found that action-related and content-related 
variables gathered from a single LMS platform were the primary indicators for mon-itoring, 
awareness and reflection and there was no difference with traditional dash-boards 
in visualization types [19, 20]. Only one articles summarized the sample sizes 
and found that 84% of the studies had fewer than 500 participants [17]. 
These reviews taken together revealed that most of the researchers were interested 
in comparing the ?elds of LA and EDM research, in addition to providing basic 
bibliometric analysis. Different conclusions made by these articles also indicated the 
scope is too small to obtain consistent results. Since LA belongs to applied science (i.e. 
utilizing analysis for extracting useful knowledge in support of decision-making), what 
knowledge has been discovered from the studies is perhaps more important than 
reporting on the type of analytic methods. Therefore, in producing a potentially 
valuable metric in reviews of LA research, efforts should be made on addressing 
research gaps that involve results of these studies. 
3 Method 
The review analysis followed the review procedures proposed by Cooper [22], which 
consist of ?ve steps: (a) formulation of research problems, (b) searching relevant 
resources, (c) evaluation of the appositeness of the data, (d) analysis and synthesis of 
relevant data, and (e) presentation of the results. “Learning Analytics” was applied as a 
key term to search for related articles on the Web of Science database. Two additional 
data sources, the Journal of Learning Analytics and Proceedings of the Learning 
Analytics and Knowledge conference, were combined with Web of Science LA articles 
in forming the raw dataset. The search period was from 2000 to 2017, but the ?rst LA 
article mined was published in 2011, so the resulting time period is from 2011 to 2017. 
After the search, 560 journal articles (412 from Web of Science and 148 from Journal 
of Learning Analytics) and 491 conference papers (1,051 in total) were collected. Two 
articles were removed due to duplication. Therefore, 1,049 articles moved forward to 
the data exploration phase. Limitations of this method exist, as there is likely LA 
research published that was not captured through these portals. An additional limitation 
is that papers in languages other than English were excluded. 
1100 X. Du et al.
To address the research objectives, a coding scheme was de?ned to generate 
derived variables (see Table 2). All 1,049 articles were carefully reviewed and coded 
by two experienced researchers within the ?eld. To ensure inter-rater reliability, if there 
were inconsistent coding values or unclear concepts, the researchers discussed until 
consensus was reached [23]. 
4 Results 
Results start with bibliometrics (i.e. data exploration). Because cross comparisons 
contain too many possible combinations, only descriptive results are reported in this 
section. 
4.1 Bibliometrics 
Trends of Publication Numbers. Figure 1 shows the number of publication across 
years. The rising trend was a result of combining the conference papers and journal 
articles. Based on the theory of innovation diffusion, we infer 2011 to 2013 is the stage 
of innovators who wanted to be the ?rst to try the innovation. Higher percentages of 
proceedings also can be observed in 2011-2013. Starting from 2014, the Journal of 
Learning Analytics released their ?rst issue and published approximately 35 articles 
annually. In addition, the percentages of journal publications started to exceed the 
number of proceedings. We can consider 2014–2016 as the stage of early adopters. The 
growth rate is the highest at all stages of innovation diffusions and researchers can be 
considered as opinion leaders within the ?elds and are comfortable adopting new ideas. 
Starting from 2017, we infer LA research is at the stage of early majority. Early 
majority in this sense means that more researchers and journals, other than innovators 
and early adopters, worked on or published LA research. If our inference is correct, we 
can expect the growth rate will slow down and start to remain a stable level of 
publication in forthcoming years. 
Table 2. The coding scheme 
Dimensions Purposes 
Bibliometrics Gain an overview of research trends via data exploration 
Research 
approach 
Code research approaches for later comparison. Possible values include 
concept or framework only, proof of concept with small scale data analysis, 
and data analysis 
Research 
strategies 
Code research strategies for later comparison. Possible values include 
descriptive, predictive and prescriptive 
Analytic 
methods 
Code analytic methods for later comparison. Possible values include 
descriptive statistics, visualization, social network analysis, unsupervised 
learning methods, and supervised learning methods 
Sample Code sample characteristics for later comparison. Possible characteristics 
include target population, sample size and educational level 
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1101
Nationalities and Authors. The ?rst author’s countries are listed in Fig. 2. The bar 
charts show the numbers of publications by years and the ?rst authors’ nationalities. 
The lines in the individual charts are the average annual publications of the top eight 
countries. First, scholars in USA, Europe, and Australia started to publish LA research 
since the outset. However, Asian scholars, such as those from China and Taiwan did 
not publish LA articles until 2015. In proceedings publications, USA signi?cantly 
exceeded the average of the top 8 countries. England, Australia, and Canada can be 
considered as equal to the average. The rest of countries were below average. Although 
China is below average, the trend is rising. 
27 
52 47 
56 
88 
101 
127 
1 
7 
21 
74 
120 
172 
156 
0 
50 
100 
150 
200 
250 
300 
2011 2012 2013 2014 2015 2016 2017 
Journal articles 
Proceedings 
Fig. 1. The distribution of papers from 2011 to 2017. 
Fig. 2. The distribution of the top 8 countries and districts. 
1102 X. Du et al.
In the number of journal publications, USA authors still signi?cantly exceeded the 
average. Spain and Australia were equal to the average. The rest of countries were 
below average. The charts also indicate scholars in England were more focused on 
producing proceedings, while Spain and Taiwan were more focused on producing 
journal publications. One notable trend is that USA, Spain, Australia and England 
reached the highest numbers of journal publications in either 2015 or 2016 then show a 
decline in 2017. Overall, USA had signi?cantly higher numbers in both proceedings 
and journal publications and that also raised the average numbers. Asian scholars 
(Taiwan and China) did not publish LA studies until 2015. 
First Authors’ Departments and Journals’ Research Areas. A total of 558 articles, 
including 410 Web of Science articles and 148 Journal of LA articles, were examined 
to identify the ?rst authors’ departments and journals’ research areas. Around 67.2% 
(375/558) of LA articles were published in the educational journals. Journals in 
computer science and psychology also published large percentages of LA articles 
(17.2% (96/558) and 8.6% (48/558)). Journals in these three areas published 93.0% 
(519/558) of the total number of LA articles. In 53.0% (296/558), the ?rst author’s 
department was in the ?eld of education, with 19.2% (107/558) was in computer 
science and 10.9% (61/558) in information systems. The results indicate that LA has 
attracted research effort from non-education ?elds, but especially in computer science 
and information science. Because analytics and algorithm development are important 
topics in those ?elds, the outcomes are somewhat unsurprising. However, 42.1% 
(45/107) of ?rst authors in computer science and 62.3% (38/61) of ?rst authors in 
information systems chose to publish LA articles in educational journals. However, 
?rst authors in the Education ?eld tended to publish LA articles in psychology journals 
rather than computer science. The results might be an indication that the education 
researchers are from a sub-?eld of educational psychology. However, the results may 
also indicate that computer science journals have higher entrance barriers for 
researchers from education. 
Most Proli?c Journals. Figure 3 shows the top 10 journals with the highest numbers of 
LA publications. The bar charts show publications of individual journals by years, and 
the line in each chart denotes the annual average of the top 10 journals. First, the results 
indicate the top 10 are all educational journals. Second, the Journal of LA far exceeded 
the average of the top 10 journals in the number of publications. Computers in Human 
Behavior was equal to the average. The rest of journals were below average. Only two 
journals (IEEE Transactions on Learning Technology and Interactive Learning Envi-ronments) 
show increasing numbers of LA studies. The remaining eight journals 
reached their highest numbers of publications in either 2015 or 2016, then show a 
decline in 2016–2017. 
4.2 Research Trends 
Research Approach. Before coding research approaches, the coders removed 148 
proceedings articles collected from the Learning Analytics and Knowledge confer-ences. 
The articles were considered too short (less or equal to two pages) to provide 
robust insights. Figure 4 contains research approaches of 901 articles. The four codes, 
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1103
(1) review, (2) concept or framework only, (3) proof of concept with small scale data 
analysis, and (4) data analysis, reflect the level of analysis involved in the study. 
Review articles aimed to summarize ?ndings via literature review. Concept or 
framework only articles focused on the introductions of perspectives, concepts, or 
frameworks. The majority of proof of concept articles focused on the introductions of 
the proposed methods or frameworks, followed by a small scale of data analysis for the 
proof of concepts. The data analysis articles describe detailed steps of data collection, 
analysis methods, results, and interpretations. 
The results of the research approaches are reported as follows. Nineteen papers 
were coded as review articles. Roughly 62.7% (565/901) of proceedings and articles 
were concept or framework only (300 articles, 33.3%) and proof-of-concept (265 
articles, 29.4%) with small scale data analysis. Only 317 articles (35.2%) fell into the 
data analysis category. The coders found the distributions were similar between Web of 
Science and the Journal of Learning Analytics. Therefore, journal articles were not 
separated for further comparison. 
In this section, 317 data analysis articles were identi?ed. The following sections 
will further analyze these articles from the aspects of research topics, research strate-gies, 
methods, sample, and major ?ndings. 
Research Topics. We further analyzed the purpose of study of the 317 data analysis 
articles. The research topics of prediction of performance, decision support for teachers 
and learners, detection of behaviors and learner modeling, descriptive and predictive 
analysis of retention/dropout, descriptive and predictive analysis of cognitive states, 
account for 82.6% (262/317) of LA publications. In summary, there are three major 
directions in LA research: (1) predict student’s performance or the likelihood of 
dropout (2) detect student’s learning progress via analysis, and (3) provide feedback or 
Fig. 3. Journals that published more than 10 papers on learning analytics. 
1104 X. Du et al.
modeling based on analysis results. Predict student’s performance or likelihood of 
dropout attracted the most research efforts. For number of publications, proceedings 
exceeded journal articles in analysis of cognitive states, learner interactions, and others. 
Considering proceedings might represent new research trends in the ?eld, we might 
expect to see more related articles in these topics in the future. 
Research Strategies, Analytic Method, and Sample. The coding results of 317 data 
analysis articles are presented. Of these, 171 (53.9%) articles were coded as descriptive 
studies. The descriptive methods include descriptive statistics, data visualization, social 
network analysis or unsupervised learning techniques to conduct their studies. Statistics 
(45.0%), data visualization (24.0%), and clustering (15.2%) are the most popular 
analytic methods in the descriptive analysis. 93% articles’ sample sizes were either 
smaller than 500 (57.3%) or larger than 1,000 (35.7%). Comparing the method trends 
across years, statistics, data visualization, and clustering show rising trends while the 
other methods show consistency in trend. 
Again, of the studies coded for data analysis, 141 (44.5%) articles were coded as 
predictive studies. The adopted methods include regression (58.3%), decision tree 
(10.4%), or other supervised learning algorithms. Similar as the distribution of descriptive 
studies, 92.2% of studies had either smaller than 500 (51.1%) or larger than 1,000 sample 
sizes (41.1%). When further checking the method trends across years, Naïve Bayes, 
support vector machine, and ensemble method show rising trends as the rest of methods 
show steady numbers across years. Only ?ve articles were coded as using prescriptive 
analysis. Prescriptive studies, which aim to discover hidden issues and propose the cor-responding 
solutions, only rely on descriptive methods such as statistics and data visu-alization. 
Because there were only ?ve articles, no trend can be extracted from this group. 
Fig. 4. The research approach of the 901 papers. 
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1105
Sample Sizes and Learning Environments. The sample sizes and the learning envi-ronments 
have also been examined. The distributions of MOOCs were different from 
higher education studies, and were discussed separately. Overall, 82.6% (262/317) of 
studies targeted higher education (higher education (208) + MOOCs (54)). However, 
61.8% (128/208) of higher education studies had smaller than 500 samples, while 
75.9% (41/54) of MOOC studies had larger than 1000 samples. Only 17.4% (55/317) 
studies were focused on the K-12 environment, and smaller datasets were even more 
common in the K-12 level. A majority 69.1% (38/55) of studies had smaller than 500 
samples. In all studies with over 10,000 samples, only 9% (6/67) were conducted in K-12 
environments. 
5 Discussion 
The purpose of this review study is to reveal the development trends of LA by ana-lyzing 
related papers from 2011 to 2017. It was found that the development of LA 
showed a dramatic growth beginning from 2014, showing the signi?cant role for 
improving data-driven decision making in education settings. Scholars from USA had 
the most contributions within the LA domain, followed by researchers from Australia 
and Europe. Scholars from Asia began to show a strong interest in this topic since 
2015. The number of journal publications showed a declining trend in 2017. On the 
other hand, proceeding publications did not reflect the same declination, so that trend 
reflecting journal publications could be abhorrent. It is also worth noting that 289 
(51.6%) of 560 articles have been published by the top 10 producing educational 
journals. The remaining papers were published by more than 100 different journals, 
which may indicate that many interdisciplinary or non-educational journals tend to not 
embrace LA research. Many scholars from non-education ?elds, especially in computer 
science and information science, have published research studies related to LA, 
however, educational scholars may have met with higher entrance barriers in journals 
based in other ?elds. 
It was also found that nearly 70% of the papers did not present numerical data 
analysis, and the majority of research studies in the ?eld of LA focus on proposing 
frameworks or conducting proof-of-concept research, which is consistent with previous 
reviews [16, 19, 20]. Therefore, future research could pay more attention to data 
analysis and work to provide unique discoveries in guiding how to improve and 
optimize the processes of teaching and learning. Although prediction of student per-formance 
or likelihood of dropout attracted the most research effort, we might expect 
seeing more articles forthcoming on other related topics. 
Considering research strategies and analytic methods, 53.9% of the articles were 
coded as descriptive studies, which generally adopted descriptive statistics, data 
visualization and clustering to conduct their studies. Further, 44.5% of articles were 
coded as predictive studies, which often adopted regression or other supervised 
learning classi?cation algorithms. More than 50% of the articles–both for descriptive 
analysis and predictive analysis–have conducted their studies on a relatively small 
sample size (less than 500). A reasonable explanation is that it is challenging to gather 
1106 X. Du et al.
the educational data, due to various reasons, such as data retention and individual 
privacy. 
The most popular research environment was higher education (including MOOC), 
which is consistent with previous reviews [14]. A possible reason is that academic 
analytics [9] mainly focus on the issues of high educations’ student success and 
learning analytics draws from that success. Therefore, it is not hard to understand why 
most scholars devote themselves to the research of educational issues in high educa-tion. 
LA is identi?ed as “speci?cally correlative to the K-12 education arena” [24], and 
the study of K-12 education has not been ignored, which is line with the future research 
suggestions of Ferguson [14]. 
6 Conclusion 
This review includes the development trends and high-order thinking of the research 
?eld of learning analytics by systematically analyzing related papers gathered from web 
of science, Journal of Learning Analytics and the LAK conference proceedings 
spanning 2011 to 2017. In large part based on the results of a meta-review of existing 
literature, the results provide some insight as to LA research, the topics and domains 
that cover the ?eld, and the analytic results of the research. The development of 
learning analytics is trending toward more publications overall, with scholars hailing 
from the United States contributing the most. Many scholars outside the area of “ed-ucation” 
are also contributing to the LA literature. The majority of papers are still in the 
early stages of research development, proposing concepts or frameworks and con-ducting 
proof-of-concept analysis, con?rming earlier reports [16]. Although some 
emerging machine learning algorithms within the educational realm are promising, 
traditional statistical methods are still preferred by many scholars. The most proli?c 
research area of LA focuses on higher education, also found in 2012 by Ferguson [14], 
but more recently research has broadened to include K-12 educational settings. 
References 
1. Picciano, A.G.: The evolution of big data and learning analytics in american higher 
education. J. Asynchronous Learn. Netw. 16(4), 9–20 (2012) 
2. Yiu, C.: The big data opportunity: making government faster, smarter and more personal. In: 
Policy Exchange, pp. 1–36 (2012) 
3. Wang, G., Gunasekaran, A., Ngai, E.W.T., Papadopoulos, T.: Big data analytics in logistics 
and supply chain management: certain investigations for research and applications. Int. 
J. Prod. Econ. 176, 98–110 (2016) 
4. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. 
Health Inf. Sci. Syst. 2(1), 3 (2014) 
5. Daniel, B.: Big data and analytics in higher education: opportunities and challenges. Br. 
J. Edu. Technol. 46(5), 904–920 (2015) 
6. 1st International Conference on Learning Analytics and Knowledge. https://tekri.athabascau. 
ca/analytics/. Accessed 18 Apr 2018 
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1107
7. Learning Analytics: The Future is Now. https://edtechdigest.com/2012/05/10/learning-analytics-
the-future-is-now/. Accessed 18 Apr 2018 
8. Bienkowski, M., Feng, M., Means, B.: Enhancing teaching and learning through educational 
data mining and learning analytics: an issue brief. https://tech.ed.gov/learning-analytics/. 
Accessed 18 Apr 2018 
9. Elias, T.: Learning analytics: de?nitions, processes and potential (2011). http:// 
learninganalytics.net/LearningAnalyticsDe?nitionsProcessesPotential.pdf. Accessed 18 Apr 
2018 
10. Chatti, M.A., Dyckhoff, A.L., Schroeder, U., Thüs, H.: A reference model for learning 
analytics. Int. J. Technol. Enhanc. Learn. 4(5/6), 318–331 (2012) 
11. Romero, C., Ventura, S.: Data mining in education. Wiley Interdiscip. Rev. Data Min. 
Knowl. Discov. 3(1), 12–27 (2013) 
12. Siemens, G., Baker, R.S.J.D.: Learning analytics and educational data mining: towards 
communication and collaboration. In: International Conference on Learning Analytics and 
Knowledge, pp. 252–254. ACM (2012) 
13. Kumar, R., Sharma, A.: Data mining in education: a review. Int. J. Mach. Eng. Inf. Technol. 
5(1), 1843–1845 (2017) 
14. Ferguson, R.: The state of learning analytics in 2012: a review and future challenges. 
Technical Report KMI-12-01, Knowledge Media Institute the Open University UK (2012). 
http://kmi.open.ac.uk/publications/techreport/kmi-12-01. Accessed 18 Apr 2018 
15. Papamitsiou, Z., Economides, A.A.: Learning analytics and educational data mining in 
practice: a systematic literature review of empirical evidence. J. Educ. Technol. Soc. 17(4), 
49–64 (2014) 
16. Sin, K., Muthu, L.: Application of big data in education data mining and learning analytics – 
a literature review. ICTACT J. Soft Comput. 5(4), 1035–1049 (2015) 
17. Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S.H., Isohanni, E., Korhonen, 
A., Petersen, A., Rivers, K., Rubio, M.A., Sheard, J., Skupas, B., Spacco, J., Szabo, C., Toll, 
D.: Educational data mining and learning analytics in programming: literature review and 
case studies. In: ITiCSE on Working Group Reports, pp. 41–63. ACM (2015) 
18. Avella, J.T., Kebritchi, M., Nunn, S.G., Kanai, T.: Learning analytics methods, bene?ts, and 
challenges in higher education: a systematic literature review. J. Interact. Online Learn. 20 
(2), 1–17 (2016) 
19. Schwendimann, B.A., Rodrigueztriana, M.J., Vozniuk, A., Prieto, L.P., Boroujeni, M.S., 
Holzer, A., Gillet, D., Dillenbourg, P.: Perceiving learning at a glance: a systematic literature 
review of learning dashboard research. IEEE Trans. Learn. Technol. 99, 30–41 (2017) 
20. Rodríguez-Triana, M.J., Prieto, L.P., Vozniuk, A., Boroujeni, M.S., Schwendimann, B.A., 
Holzer, A., Gillet, D.: Monitoring, awareness and reflection in blended technology enhanced 
learning: a systematic review. Int. J. Technol. Enhanc. Learn 9(2/3), 1–26 (2017) 
21. Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature 
review. In: Learning Analytics: Fundaments, Applications, and Trends, pp. 1–23. Springer 
(2017) 
22. Cooper, H.M.: Organizing knowledge syntheses: a taxonomy of literature reviews. Knowl. 
Soc. 1(1), 104 (1988) 
23. Chang, C.Y., Lai, C.L., Hwang, G.J.: Trends and research issues of mobile learning studies 
in nursing education: a review of academic publications from 1971 to 2016. Comput. Educ. 
116, 28–48 (2018) 
24. Johnson, L., Adams, S., Cummins, M.: NMC Horizon Report: 2012K-12 Edition, The New 
Media Consortium (2012) 
1108 X. Du et al.
Students’ Evidential Increase in Learning 
Using Gami?ed Learning Environment 
V. Z. Vanduhe1(?) , H. F. Hassan2 , Dokun Oluwajana1 , M. Nat1 , A. Idowu1 , J. J. Agbo1 , 
and L. Okunlola1 
1 
Cyprus International University, Nicosia, Cyprus 
vanyeb4u@gmail.com, dklewa@gmail.com, mnat@ciu.edu.tr, 
richarddw6@gmail.com, nurse_johnson@yahoo.com, 
ayooluwa85@yahoo.com 
2 
Cihan University, Erbil, Iraq 
eng.hasan.f.hasan@gmail.com 
Abstract. Gami?cation has been amazing in the past few years as it has cover 
education and training due to continue technological innovation. It is evidential 
that Gami?cation increases participation, motivation and engagement. However, 
gami?cation design and implementation fail to achieve desirable outcome in 
education due to poor design and mainly the gami?cation environment that is 
being gami?ed. This research paper is aimed at mapping game elements in a well-known 
generally accepted Learning Management System. This provides a gami- 
?cation environment that addresses limitation of gami?cation in education. A 
gami?ed course in learning management system (GCLMS) is developed to study 
increase in student, learning from GCLMS. Steps and levels of the gami?cation 
environment is shown with an evidential prove of how GCLMS increase students 
learning through an initial evaluation. Feedback results from students increase in 
learning and con?dence in applying what they learnt from the GCLMS in real life 
scenario. This research was carried out on 47 second year undergraduate nursing 
students. 
Keywords: Gami?ed course in learning management system (GCLMS) 
Gami?cation in learning environment (GLE) · Gami?cation 
Gami?cation in education · Gami?ed learning 
1 Introduction 
Generally, the use of Learning Management Systems (LMS) has also come to embrace 
terms such as virtual learning, games, tele-learning, blended learning, ?ipped learning, 
gami?cation and mobile learning in order to help students to interact with their peers 
and instructor. These interactivities have increased learner’s ability to build their own 
knowledge especially when learners interact with their instructor and other learners 
because technology facilitates learning, support equal accessibility and increase knowl- 
edge sharing among students. LMS serves as potential link between students and exten- 
sive shared resources to help in accomplishing their educational responsibilities also a 
support for students’ enhanced learning outcomes through di?erent educational 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1109–1122, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_82
technological methods, with the aim of increase learning. According to [5, 22] note that 
in order to achieve enhanced educational learning and performance on students of this 
current generation which are referred as digital natives, technology need to be involved 
in education because we are in a digital age. The present generation of students which 
are called digital native which means they are born into technology. With technology, 
the willingness of student to use technology in education is paramount. This creates a 
room for universities to adopt e-Learning system that will provide learning through 
technological tools. Moodle is a well-known LMS for digital immigrates and digital 
natives; therefore, there are limited adaptation and studies on mapping gami?cation in 
Learning Management Systems. An acceptable gami?cation environment in education 
has been in play for the past few decades. Numerous study on applying gami?cation in 
education recorded limitation on increase in student learning and performance as well 
as an acceptable easy to use gami?cation environment for education. In order to provide 
solution to the current gami?cation in education limitations stated as, we developed a 
system to map gami?cation in education using Moodle plugins. At the course of this 
study, aside introduction which is the ?rst, (2) related work on gami?cation, (3) gami- 
?cation assessment in Moodle, (4) description of Current GCLMS called gami?ed 
course environment (GCE), (5) initial evaluation of the use of GCLMS, (6) conclusion 
and future work. 
2 Related Work 
Gami?cation in compares the use of game elements, mechanics, or experience in non-game 
design setting to achieve a speci?c goal in learning, through digital motivation 
and engagement. The authors in [5, 14, 16] de?ne gami?cation as the use of game 
elements and mechanism in non-game design settings. Gami?cation is a personal expe- 
rience that a user gains through virtual and digital game element, which creates intrinsic 
motivation to learn through personal experience [3, 15, 30]. The use of game creates 
positive emotions which motivates learners not to reach burn out stage in learning. 
Learning with emotions makes learners to have culture to triggered behavioral motiva- 
tion and engagement. Behavioral motivation and engagement is met where someone full 
participation and involvement in activities in an online environment is determined [9]. 
Engagement measurement methodologies are studied in an online platform through self-reporting, 
teaching-rating, logs, viewing course content online and observation techni- 
ques [7]. The authors in [5] ?nd out in their study that online behavioral engagement 
measurement which was conducted in a macro-level includes total counts logs and times 
that individuals spent on each online activity during a semester in learning management 
system. Assessment in gami?cation is important, the assessment need to be design to 
factor towards the goal of the gami?cation design. Excessive measures on assessment 
in gami?cation de-motivate users. Thus, any form of platform that contains motivation, 
progressiveness and instant feedback is known as a gami?cation platform because they 
are the pillar that gami?cation stand on [5, 6, 14]. 
1110 V. Z. Vanduhe et al.
2.1 The Adoption Gami?ed Learning Environment in Learning Management 
System (LMS) 
Gamifying Course in LMS is driven from current studies on gami?cation in education 
with intensive study of gami?cation platforms. Gami?ed learning environment is the 
application of game mechanisms or elements in online education environment to 
improve and motivate student leaning and behaviors [1, 2, 6, 13, 15, 18]. Study on the 
use of educational LMS has conducted di?erent analysis to examine student motivation 
towards gami?cation [2, 17]. 
According to the Horizon Report of Higher Education 2017 Edition states that 
gami?cation is part of new technological innovations that educational institutions need 
to adopt in order to improve student success. They concluded that this approach would 
assist educational stakeholders to provide a ?exible and user-friendly ways of learning 
in educational environments that will concurrently meet up educational needs and avert 
current challenges in students learning [2–4, 13, 21]. 
In an empirical study on exploring the impact of intrinsic and extrinsic motiva- 
tion [1, 4] on the undergraduate students’ participation and performance in an online 
gamified learning intervention [12, 22] find out that gamification practices have a 
positive impact on students intrinsic learning needs. Their results show that the posi- 
tive impact of gamified intervention on student participation varies depending on the 
student type of motivation that occurs during the process. Lack of practical experi- 
ence in learning in a simulated or real life scenario creates negative impact on 
student confidence in knowledge acquired and also on discussion online [3, 4, 21]. 
Therefore, using gamification gives students chance to see professionalism in prac- 
tice. Other related studies on student motivation on the use of gamification in online 
learning environment (GOLE) have shown that it supports different educational 
methodologies such as gamified pedagogy, classroom live, collaboration, self-regu- 
lated learning, diversifying of ideas, scaffolding of concepts, simulation, and self-reflection 
among students in a GCLMS [2, 12, 19, 20]. 
In a study by [6, 12] state that the use of gami?ed pedagogy helps students to improve 
self-motivated learning and interaction with course content and collaboration among 
them. In addition [3, 5, 6, 19] stated that good implementation of e-learning with gami- 
?cation increases students’ satisfaction, engagement, e?ectiveness and e?ciency in 
learning. However, in a study by [7, 11, 17, 28] on using game based learning in LMS 
approach, contribute that information systems enhance students deep leaning in a 
convenient and attractive ways to relate, collaborate and share ideas within a Problem 
Based Learning (PBL) system approach [6, 20]. Although, with the numerous persistent 
study on the use of information system tools in PBP [11], only few studies on student 
motivation on the use of gami?cation in LMS online learning environment have been 
worked on [4, 23, 24]. 
The adoption or mapping game elements in education has been e?ective in students 
learning. Therefore, current research on gami?cation has given this study foundation to 
map gami?cation in Learning Management Systems using common gami?cation design 
process. 
Students’ Evidential Increase in Learning 1111
2.2 Gami?cation Design 
Technological artistic aesthetics that is designed to create interaction between two or 
more players for exercise, edutainment, or education for the purpose of experiment is 
referred to as game design. Game design consist of de?ed rules, goals and challenges 
towards role playing game, video game, casino game, table top game, sports or logic 
thinking game [9, 15, 27, 28]. 
The de?nition of gami?cation is based on the use of game element in (a non-game) 
which is the use of game elements in education, training, or organization using techno- 
logical game engines. This leads us to studies done by [12, 29, 22] that when designing 
gami?cation, the following need to be consisted: 
• The ?rst thing to consider is the innovation process gami?cation design. This inno- 
vation design process involves the transfer of game elements to the learning or 
training platform that is enticing, friendly and easy to use. 
• Profound understanding of user’s demotivation and problems in students face in 
learning. 
• The gami?cation platform must have the ability to a?ect the behavior of users through 
interaction with peers, unveiling task, and competition ability. The ability of user to 
reply after activity completion should not be undermined, because the use of gami- 
?cation provides users to replay even when he/she fails a level. 
• Therefore, the psycho-behavioral e?ect of gami?cation has open another layer to 
game design to add group of individual behavioral e?ect. 
2.2.1 Framework of Gami?cation Design 
Mechanics, Dynamics and Aesthesis/Sensation (MDA) are used to de?ne the foundation 
of gami?ed design elements [8, 29]. 
Gami?ed mechanics is de?ning as tool that describes speci?c composition of game 
elements such as badges, collection, or achievements. Examples of game mechanics are 
rewards, chances, resource acquisition, transactions cooperation, challenge, feedback, 
and win stage. 
Dynamics is the systematic connection of a game player with the gami?cation 
system, the connection include teamwork, collaboration [1], choice making, or compe- 
tition with other participants. This usually aims at creating remarkable user gameful 
experience [1, 9, 6, 15, 27] examples are expression, challenges, time pressure or tension. 
Aesthesis/Sensation this is the extraction of user emotions towards the game. This 
is a sensational motivation in?uenced by player’s desire. Typical example is fantasy, 
collaboration, fellowship, expression, discovery or submission [22, 30]. 
Due to the diver’s use of game element and its application, the need to determine the 
e?ect of game element on individual intrinsic and extrinsic behavioral needs. Extrinsic 
rewards such as badges or money does not increase motivation and engagement in a 
long run [16, 17, 22] whereas, intrinsic factors, such as self-enjoyment, and engagement 
keeps player’s loyalty and at the long run increase motivation [12, 30]. 
1112 V. Z. Vanduhe et al.
Intrinsic and Extrinsic Gami?cation E?ects. Gami?cation has a con?rmed direct 
e?ect on the intrinsic stands of a player or user. Intrinsic motivation is seen as a self-determined 
mind to achieve a goal. While extrinsic motivation depends on physical 
rewards such as money or verbal gifts. In [7, 29, 30], the authors de?ne both Intrinsic 
and extrinsic as what drive game user to continue playing. This increase the quality of 
e?ort the people invest in a given task. The authors in [20, 30] gave more light on intrinsic 
motivation that work experience in a controlled environment has a proportional negative 
impact on performance, but a self-determined drive to work increases work performance 
[16, 20, 21, 30]. 
2.3 GCLMS Adoption 
Although there are various methodologies on ground on the adoption of GLE, as well 
as the study on use for users’ behavioral intentions in order to investigate GLE adoption. 
Studies show that most common methodology is through quantitative and qualitative 
data collection or mixed method. 
The author in [16] examined the incorporating intrinsic motivator into the Tech- 
nology Acceptance Model and the study attempts to explain students’ behavioral inten- 
tion by using the e-learning system from a motivational perspective through quantitative 
method and the results show that both perceived usefulness and enjoyment have signif- 
icant impact on students’ intention to use. 
In another study on GLE adoption and acceptance, [10, 20, 29] developed a policy 
adoption framework for implementation of gami?cation through scienti?c investigation 
of combine principles of the Technology Acceptance Model and Technology-Organi- 
zation-Environment (TOE) framework. The methodology used comprises of seven 
stages that adopt the interpretive paradigm and a mixed-methods research design and 
proposed the integration of academic users’ acceptance with macro-level factors [8, 28]. 
Based on the existing literature found, there are limited studies on the use of quan- 
titative data in tandem to assess students’ motivational attitude toward the use of LMS 
in a GLE environment, and this study attempts to address the gap in the literature [5, 8, 
15, 22]. 
3 Gami?cation Assessment in MOODLE 
In gamifying a course in Moodle, the major algorithm that run to provide a gami?ed 
experience is based on pointing system. To earn points, certain rules need to be put in 
place where by, point determines the level, badge, and shift in the progress bar and other 
game elements in Moodle. At certain point, when a player earns points, the accumulated 
point unlocks badge, this is set by the course administrator, and same procedures are 
made for other game elements. Therefore, the contextual activity of gami?cation in 
Moodle is accessed by points in which the point is assigned by the pointing rule, Level 
up is earned based on leveling rule drive from pointing rule. Badge which represent 
achievement is earn based on badge rule driven from pointing rule set by the adminis- 
trator, movement in progress bar are earned based on the activity competed. Leaderboard 
Students’ Evidential Increase in Learning 1113
works based on rule set by the pointing rule where by, the more points earn, the more 
proportional move in the leaderboard. Figure 1 below illustrates the pointing rule [21– 
23, 25]. 
Fig. 1. Pointing rule in GCLMS. 
1114 V. Z. Vanduhe et al.
As stated in Fig. 1, gami?cation in action is formed in content base and participant 
behavioral responsive action on the content. Gami?ed content in Moodle tends to 
increase user’s interaction with the content. Point allocation to content is described as 
point assignment and pointing rule as assigning rule to cumulative speci?c action. Level 
creates an impressive representation of completed activity in Moodle gami?ed content. 
This is based on assigning rules to an upward movement in leveling. In assigning badge 
to set of activity completed increases user’s aspiration which in turn is based on assigning 
point to badge. This means that an accumulated point earned gives birth to a badge. A 
bar in a progress bar represent an activity, therefore, when an activity in completed, there 
will be an automatic shift to the next progress bar. Progress bar rules are apportioned 
to set of activities. In Moodle leaderboard is referred to as ladder board where by the 
serve the same purpose. The system that runs in ladder board is same as progress bar. 
In adopting gami?cation in LMS, as discussed above pointing is the fuel of gami?- 
cation in this research which drives all other game element. Figure 2 below illustrates 
game element mechanism for this study. 
Output 
Engagement 
Motivation 
Change in behavior 
Participation 
Collaboration 
Experience 
point 
Collaborative 
Point 
Recognition 
earn 
Moodle game 
elements 
Moodle games 
mechanism 
Moodle Forum 
Point 
Level up 
Badge 
Progress bar 
Leaderboard 
Comple??on tracking 
Achievement 
Rewards status 
Ac??vity restric??on 
Time limit on ac??vity 
Measurement 
Status 
Fig. 2. GCLMS design. 
4 Description of Current GCLMS 
Moodle LMS create a course in form of GCLMS (see Fig. 1). This application was demon- 
strated and presented to the participants with instructions on interactive video quiz gamifi- 
cation, for over a period of 5 weeks. The purpose is to help the students understand how 
gamification works, improve their knowledge, to create interest and make students engage 
in Moodle gamified learning environment [9, 14, 16]; also, to learn about real-life issues, 
critical - thinking, participation and collaborate with other members. 
Upon the completion of this activities, students earn rewards for completion such as; 
point, level ups, progress, leader board and badge. This is to introduce competitiveness 
among students, also to be ranked accordingly based on leader board and it is determined 
Students’ Evidential Increase in Learning 1115
by the number of points they received after completing their tasks. Participants were allowed 
to use the application anywhere as long as they participant completed the task [17]. 
Gami?cation in Moodle is designed based on gami?cation plugin installed to gamify 
courses within the LMS [18–20]. The algorithm that runs to gamify course in Moodle 
is called CRUD. This algorithm gives points based on CRUD which stands for Create 
Read Update and Delete within LMS). CRUD play monitors course [1, 3, 4, 10, 22], 
activity views and activity completion as well. Three levels where created in the Moodle 
GCLMS (see Fig. 3) resources used are quiz venture (see Fig. 4), H5P interactive video 
(see Fig. 5 and millionaire game was used for quiz assessment (see Fig. 6). Moodle 
bocks such as level up ladder is used to display ranks, badge notated levels, list of 
Fig. 3. Current version of our Moodle gami?ed course showing the course dashboard. 
Fig. 4. Illustrating gami?ed quiz in Moodle using Quiz venture 
1116 V. Z. Vanduhe et al.
participant, total experience points and progress (see Fig. 7); progress bar which shows 
activity that is needed to be completed and the progress within the GCLMS (see Fig. 8). 
Fig. 5. A gami?ed interactive video explaining steps in carrying out vital sign test. 
Fig. 6. Show wants to be a millionaire gami?ed quiz. 
Students’ Evidential Increase in Learning 1117
Fig. 7. Demonstrating game elements in Moodle such as rank, levels, total individual experience 
point, badge and progress within GCE. 
Fig. 8. Showing activities that are to be completed within the GCLMS. 
This is a normal Moodle page, but this has some little features which depicts game 
element. Figure 3 is a screen short of user interface of gami?ed learning environment 
used in this research. The top icon on the right shows the experience points earned as 
well as avatar, below is the ladder board as well as progress bar. Level one to level three 
are the Moodle gami?ed content using quizventuire, interactive video and who want to 
be a millionaire game. Figures 4, 5, 6 and 7 describe the gami?ed course content in 
details. Next level shows restricted until the activity requirement shows complete before 
it unveils, this is applicable to all the levels. After clicking on “level 1” called gamify 
me, Fig. 4 describes in details what happens next. 
1118 V. Z. Vanduhe et al.
Quizeventure is a Moodle plugin that gami?es quiz in such a way that the question 
is displayed at the top of the screen while the possible answers drop from the top. The 
lives are displayed at the top left hand corner of the screen. As the answers drops from 
the top, the player aims and shot at the correct answer; though possible options could 
shoot back at the player below. Arrow keys help the player navigate left, right, up and 
down, while the space bar is to shoot. Below the screen at the left side gives more 
comfortable features such as sound and full screen abilities. This plugin could take as 
many more questions and answers and design by the instructor. 
In making educational video interactive, at certain point the video automatically 
pulses and a question pops up in multiple question format or any form of questioning. 
After choosing the answers from the option the video continues. Those purple small 
cycles represent questions imbedded in the video. This gives student ability to re watch 
the video to be able to understand every part of the video. 
Who want to be a milloniare is a well known game in the world, in introducing this 
game to education. This present learning in a friendly way that motivates students to 
learn and get engaged. Gamifying quiz such as this, takes o? burn out stage and tension 
in quiz. At the top of Fig. 6 is shown the life line such as 50/50, call a friend, vote or 
cancel the question. This works as it is in who want to be a millonaire whereby, when 
you fail a questions, it takes the player to the beginning and will loose all the points gain. 
Ladder board in LMS is seen as leader board. This presents rank of all the partici- 
pants, their current level, total experience points and progress. Both students and course 
administrator is able to view the ladder board, though it could be locked from students. 
This is a progress bar block in Moodle LMS which shows completion activity 
progress in gami?ed course environment. 
5 Initial Evaluation 
An initial evaluation was carried out in other to get feedback from student with regards 
to the gami?ed platform; this initial evaluation was carried out on second year nursing 
students in Cyprus International University. The aim of this initial evaluation exercise 
is to determine positive and negative e?ect of our GCLMS before moving to the crucial 
aspect of our research which is the structural empirical aspect that deals with intensive 
research and scienti?c analysis on which game element has the highest or negative e?ect 
on students’ performance. This is a pilot program that will later cover all students in the 
faculty of health science and the English faculty. Forty-seven students took part in this 
exercise and their feedback on this report that GCLMS has well ?t and help them under- 
stand procedures in carrying out vital signs. More so, the experience of GCLMS made 
vital sign to seem as a real life scenario whereby making them have con?dence in 
carrying out vital signs on patient in hospital during practical classes in the hospital. 
There are some reports from the students: 
Student 1: “As a nursing student real life scenarios in learning increase learning, the 
gami?ed interactive video environment make it seems real”. 
Student 2: When I receive points and see my leader board higher than that of my 
mates it encourages as well as makes learning competitive and collaborative as well. I 
Students’ Evidential Increase in Learning 1119
managed to get move to some level with the help of my mates, I send them message 
using the forum to ask how they did it and they help me. 
Student 3: I had no issue in the GCLMS because is it easy to play, this is on point 
Student 4: This GCLMS in my point of view is really cool; the idea of involving fun 
in learning makes me to see nursing as fun. 
Student 5: I feel that the leaderboard and experience point noti?cation should appear 
in all the pages of the levels. I ?nd it not convenient to always go to the dashboard to 
view my points. 
Student 6: The interface where this GCLMS is running seems to be slow compared 
to the normal university Moodle. 
Student 7: I love playing game to see game in classroom is a nice experience. Playing 
the game again and again made me not to need my books again. When I see new thing 
in the GCLMS I ask my colleagues using the forum. 
Student 8: When a teacher gives me some assessment and I pass, I feel more accom- 
plished than when my colleagues assess me. GCLMS made me feel like my teacher 
teaches me and assess me and this assessment does not make me feel as a failure because 
it gives me chance to go all over again and again. 
Student 9: I suggest that at the end of the GCLMS a certi?cate should be issued so 
that I can post on Facebook. 
Student 10: I wish all my courses use this system. 
6 Conclusion and Future Work 
This study examines evidence in student learning using gami?cation by gamifying a 
course; this is a pilot study, so that GCLMS will be adopted in course that has to do with 
practicals. Numerous studies and the application of gami?cation to increase student 
learning are already adopted in many areas of students learning. Though limitation from 
the results of student feedback on their personal experience with evidential increase in 
student learning have not been acquired, a major limitation of gami?cation from 
previous study is an acceptable gami?ed environment for gamifying student learning 
was an issue. 
This study focuses on providing solution to limitations of adopting gami?cation in 
education. However, gamifying a generally accepted learning management system 
which students and educators are familiar with, it provides solution to the limitations of 
gami?cation in education. Results from student’s feedback of using GCLMS turn out 
as great achievement in gami?cation in education. 
Our future study involves improving the suggestions given by the students from the 
above initial evaluation section. An experience block on all the pages will be available 
so as to provide students information on their current experience point’s achievement. 
Certi?cate in the next version of this GCLMS will be issued in order for students to get 
self-completion achievements as well and use it to entice other students which give way 
to more game elements. 
Critical study on which game element trigger student’s behavior in learning needs 
to be addressed in future so that more emphasis will be drown on that. The adoption of 
1120 V. Z. Vanduhe et al.
this GCLMS needs to be carried out in other courses and universities to ensure and 
con?rm the result received from our study of using GCLMS. Finally, though GCLMS 
is a prototype, it is accessible from our PhD project website as a guest in http://sengage- 
ment.org/ [26]. 
References 
1. Barna, B., Fodor, S.: An Empirical study on the use of gami?cation on IT courses at higher 
education. In International Conference on Interactive Collaborative Learning, Cham (2017) 
2. Basten D.: Gami?cation, in Software Engineering. IEEE Comput. Soc. 34(5), 76–81 (2017). 
https://doi.org/10.1109/ms.2017.3571581 
3. Becker, A.: NMC Horizon Report: 2017 Library Edition, The New Media Consortium (2017). 
https://www.learntechlib.org/p/182005/ 
4. Buckley, P., Doyle, E.: Gami?cation and student motivation. Interact. Learn. Environ. 24(6), 
1162–1175 (2016). https://doi.org/10.1080/10494820.2014.964263 
5. Çakiroglu, U., Basibüyük, B., Güler, M., Atabay, M., Memis, B.Y.: Gamifying an ICT course: 
in?uences on engagement and academic performance. Comput. Hum. Behav. 69, 98–107 
(2017). https://doi.org/10.1016/j.chb.2016.12.018 
6. Challco, G.C., Mizoguchi, R., Bittencourt, I., Isotani, S.: Personalization of gami?cation in 
collaborative learning contexts using ontologies. IEEE Lat. Am. Trans. 12(6), 1995–2002 
(2015) 
7. Costa, C.J.: Gami?cation: software usage ecology. Online J. Sci. Technol. 8(1), 91–100 
(2018) 
8. De-Troyer, O.V.: Linking serious game narratives 73 with pedagogical theories and 
pedagogical design strategies. J. Comput. High. Educ. 29(3), 549–573 (2017). https://doi.org/ 
10.1007/s12528-017-9142-4 
9. Dias, J.: Teaching operations research to undergraduate management students: The role of 
gami?cation. Int. J. Manag. Educ. 15(1), 98–111 (2017). https://doi.org/10.1016/j.ijme. 
2017.01.002 
10. Ding, L., Er, E., Michael, O.: An exploratory study of student engagement in gami?ed online 
discussions. Comput. Educ. 120, 213–226 (2018). https://doi.org/10.1016/j.compedu. 
2018.02.007 
11. Freitas, A.A., Michelle, F.M.: Classroom Live: a software-assisted gami?cation tool. 
Comput. Sci. Educ. 23(2), 186–206 (2013). https://doi.org/10.1080/08993408.2013.780449 
12. Hamari, J., Koivisto, J., Sarsa, H.: Does gami?cation work? In: Hawaii International 
Conference on System Science, A Literature Review of Empirical Studies on Gami?cation, 
pp. 3025–3034 (2014). IEE. https://doi.org/10.1109/hicss.2014.377 
13. Hsu, C.-C., Wang, T.-I.: Applying game mechanics and student-generated questions to an 
online puzzle-based game learning system to promote algorithmic thinking skills. Comput. 
Educ. 1–37 (2018, in Press). https://doi-org.cmich.idm.oclc.org/10.1016/j.compedu. 
2018.02.002 
14. Johanna, P., Maria, R.-S., Christian, G.: Motivational active learning: engaging university 
students in computer science education. In: Proceedings of the 2014 Conference On 
Innovation & Technology in Computer Science Education, Uppsala, Sweden (2014) 
15. Kapp, K.M.: Choose your level: using games and gami?cation to create personalized 
instruction. In: Murphy, M., Redding, S., Twyman, J. (eds.). Handbook on Personalized 
Learning for States, Districts, and Schools. Center on innovation and learning, pp. 131–143 
(2015) 
Students’ Evidential Increase in Learning 1121
16. Landers, R., Armstrong, M.: Enhancing instructional outcomes with gami?cation: an 
empirical test of the technology-enhanced training e?ectiveness model. Comput. Hum. 
Behav. 71, 499–507 (2017). https://doi.org/10.1016/j.chb.2015.07.031 
17. Marko, U., Vukovic, G., Jereb, E., Pintar, R.: The model for introduction of gami?cation into 
e-learning in higher education. In: Procedia - Social and Behavioral Sciences, 7th World 
Conference on Educational Sciences, vol. 197, pp. 388–397 (2015). Greece: Science Direct. 
https://doi.org/10.1016/j.sbspro.2015.07.154 
18. Martin, B., Isabel, M., Markus, B., Jasminko, N.: A design framework for adaptive 
gami?cation applications. In: Proceedings of the 51st Hawaii International Conference on 
System Sciences (2018) 
19. Michael, H., Rowan, T.: A gami?cation design for the classroom. Interact. Technol. Smart 
Educ. 15(1), 28–45 (2018). https://doi.org/10.1016/j.compedu.2018.02.007 
20. Sebastian, D.: Gami?cation: designing for motivation. Interactions 19(4), 14–17 (2014). 
doi:https://dl.acm.org/citation.cfm?doid=2212877.2212883 
21. Tobias Wolf, W.H.: Gami?ed digital services: how 93 gameful experiences drive continued 
service usage. In: Proceedings of the 51st Hawaii 94 International Conference on System 
Sciences, pp. 1187–1196 (2018). http://hdl.handle.net/10125/50034 
22. Tugce, A., Berkan, C., Goknur, K.: A qualitative investigation of student perceptions of game 
elements in a gami?ed course. Comput. Hum. Behav. 78, 235–254 (2018). https://doi.org/ 
10.1016/j.chb.2017.10.001 
23. Pérez-Berenguer, D., García-Molina, J.: A standard-based architecture to support learning 
interoperability: a practical experience in gami?cation. Softw. Pract. Exp. (2018). doi:https:// 
doi.org/10.1002/spe.2572 
24. Pirker, J., Schiefer, M.R., Güt, C.: Motivational active learning - engaging university students 
in computer science education. In: Proceedings of the 19th Annual Conference on Innovation 
and Technology in Computer Science Education, pp. 297–302. ACM, Sweden (2014). https:// 
doi.org/10.1145/2591708.2591750 
25. Vitkauskaite, E.: Points for posts and badges to brand advocates: 87 the role of gami?cation 
in consumer brand engagement. In: Proceedings of the 51st 88 Hawaii International 
Conference on System Sciences, pp. 1148–1157 (2018) 
26. Vanduhe, V.Z., Nat, M., Oluwajana, D.I., Hasan, H.F., Idowu, A.: Sengagement.org. Students 
engagement http://sengagement.org/Moodle/course/view.php?id=10. Assessed 2018 
27. Werbach, K., Hunter, D.: The Gami?cation Toolkit: Dynamics, Mechanics, and Components 
for the Win. Wharton Digital Press (2015). https://books.google.com.cy/books? 
id=RDAMCAAAQBAJ 
28. Zichermann, G., Cunningham, C.: Gami?cation by Design: Implementing Game Mechanics 
in Web and Mobile Apps. O’Reilly Media, Canada (2011) 
29. Seaborn, K., Fels, D.I.: Gami?cation in theory and action: a survey. Int. J. Hum Comput Stud. 
74, 14–31 (2015). https://doi.org/10.1016/j.ijhcs.2014.09.006 
30. Mekler, E., Brühlmann, F., Tuch, A., Opwis, K.: Towards understanding the e?ects of 
individual gami?cation elements on intrinsic motivation and performance. Comput. Hum. 
Behav, 71, 525–534 (2017). https://doi.org/10.1016/j.chb.2015.08.048 
1122 V. Z. Vanduhe et al.
Improving the Use of Virtual Worlds in Education 
Through Learning Analytics: A State of Art 
Fredy Gavilanes-Sagnay1(?) , Edison Loza-Aguirre1,2 , Diego Riofrío-Luzcando3 , 
and Marco Segura-Morales1 
1 
Departamento en Informática y Ciencias de la Computación, Escuela Politécnica Nacional, 
Ladrón de Guevara, E11-253, P.O. Box 17-01-2759 Quito, Ecuador 
{fredy.gavilanes,edison.loza,marco.segura}@epn.edu.ec 
2 
CERAG FRE 3748 CNRS/UGA, 150, rue de la Chimie, BP 47, 38040 Grenoble Cedex 9, France 
lozaedison@univ-grenoble-alpes.fr 
3 
Facultad de Arquitectura e Ingenierías, Campus Miguel de Cervantes, 
International University SEK, Calle Alberto Einstein, Quito, Ecuador 
diego.riofrio@uisek.edu.ec 
Abstract. The use of Virtual Worlds in Education is becoming an innovative 
alternative to traditional education. However, these solutions are confronted to 
several issues such as: lack of indicators to follow up the students’ progress, lack 
of well-de?ned evaluation parameters, di?culties for evaluating collective and 
individual contributions, di?culties for keeping students engaged and motivated, 
a very time-consuming teachers’ supervision, and the absence of tutors for 
guiding the learning process, among others. In this review, we explore and 
describe academic contributions focused on the application of Learning Analytics 
to improve Virtual Worlds in Education from three perspectives: Personalized 
Learning, Adaptive Learning and Educational Intervention. Our results highlight 
that most of the research focus on support decisions whose nature concerns 
operational non-real-time issues. Additionally, almost all the contributions focus 
in solving only a few issues, but none of them o?er a holistic framework that 
could be used by teachers or pedagogical personnel for decision making. 
Keywords: Virtual environments · Virtual worlds · Learning analytics 
Data mining · Educational platform 
1 Introduction 
Virtual Worlds, the most common form of Virtual Habitats, is a type of Virtual Envi- 
ronment [1] that has become an innovative alternative to traditional education methods 
[2]. Over the last decade, their role has grown to the point where most of the universities 
in the world are reforming their programs for gradually bring these approaches as a 
lifelong learning instrument [3]. Monitoring the population inside Virtual Worlds or 
evaluating activity and task designs based on actual user behaviour can provide new 
insights on large scale implementations [4]. Also the unique features of Virtual Worlds 
in sensorial learning have promoted the idea of learning ways, anywhere and anytime 
in immersive and interactive contexts [5, 6]. 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1123–1132, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_83
Even though the use of Virtual Worlds in education has become almost ubiquitous, 
it is confronted in practice to several issues such: problems related with knowing what 
is happening within the virtual world to identify con?ictive user behaviours [7–9] or 
tracking the students’ interactions with elements of the virtual world [10, 11], lack of 
indicators to follow up the progress of the students in the courses [12], lack of imple- 
mentation of well-de?ned evaluation parameters [12], di?culties for evaluating the 
collective and individual contributions while the students handle tasks [13], di?culties 
for keeping students engaged and motivated [14], a very time-consuming teachers’ 
supervision in the search for signs of doubt, frustration, stress or fatigue from students 
[15], pedagogical issues that are inherent to conventional learning [16, 17], absence of 
tutors with experience to guide the learning process [17, 18]. These problems raise the 
need to pursue the quest of mechanisms to improve the use of Virtual Worlds in educa- 
tion and guarantee the e?ective ful?lment of learning objectives [19, 20]. 
In this context, Learning Analytics would contribute to solving some of the issues 
cited above. Learning Analytics refers to the measurement, collection, analysis and 
reporting of data about learners, teachers and their contexts, for purposes of under- 
standing and optimizing learning and the environments in which it occurs [21]. Applied 
on learning environments, Learning Analytics enables the analysis of data about teachers 
and learners that use the environment for identifying behaviour patterns, assess the 
learning process, improves the overall learning experience and gives the opportunity to 
use this information to re?ect on learning activity of the users [22, 23]. Learning 
Analytics seeks to exploit educational data to deliver feedback to learners and teachers 
in the system [24]. In the case of Virtual Worlds used in education, the analysed data 
can come either from interactions of avatars with other users, the 3D objects of the virtual 
world, or with the Virtual World itself (e.g. frequency of use, task accomplishment, 
movement patterns, preferred locations) [7, 25]. 
Since decisions in education – or in any ?eld – should be informed and based on the 
right choose of the best available option [26], Learning Analytics would contribute with 
useful indicators for pedagogical managers to see things from new viewpoints, reduce 
blind spots, assimilate complex data structures and address issues from ‘in-production’ 
courses. Thus, the aim of this study is to explore how Leaning Analytics has been used, 
up to date, for decision-making intended to address the issues that would impact the 
ful?lment of learning objectives using Virtual Worlds. 
To meet our research objective, we performed an extensive review of literature [27] 
to study the contributions that link the use of Virtual Worlds in education with Learning 
Analytics. We performed our review by collecting the articles from the last 10 years, 
included on Science Direct, the IEEE Xplore library, the ACM Digital Library and the 
Springer Digital Library. 
The rest of the paper is organized as follows. The next section describes the contri- 
butions found in our review. In Sect. 3, we present how the contributions deal with the 
issues cited above in this introduction. The Sect. 3 o?ers a discussion about the literature 
found. Finally, in Sect. 4 we o?er our conclusions. 
1124 F. Gavilanes-Sagnay et al.
2 Using Learning Analytics on Virtual Worlds Used in Education 
In this section, we present all the articles found during our review grouped according to 
three perspectives: Personalized Learning, Adaptive Learning and Educational Inter- 
vention [28]. 
2.1 Personalized Learning 
Personalized learning refers to instruction where the Virtual World can be set up to meet 
the learner needs. The improvement of the learning process is obtained from the analysis 
of the data of each learner to customize the environment. This customization increases 
the learners’ personal motivation and facilitates the design of strategies for educative 
coaching [29]. Personalized learning also allows the development of learning schemes 
in which individual research and experimentation are promoted. It provides a unique, 
highly focused learning path for each student. Contributions that meet these objectives 
are presented below. 
In their research, [7] propose a framework for the recovery and analysis of data 
related to educational settings of virtual worlds. For this, the authors implemented a 
pharmaceutical industrial laboratory named Usalpharma Lab, which is a virtual labora- 
tory in Second Life. The virtual laboratory represents all the installations, equipment 
and the documentation needed for teaching ‘Good Laboratory Practices’. Both, students 
and teachers are represented as avatars. Teachers guide and evaluate the activities 
proposed to students during the course, which means that they should be present when 
the activities are in progress. Every action that occurs into the Virtual World, originated 
by the user or by any event is saved into a database. The data is exploited later through 
a framework, which includes the following layers: (1) the ‘evidence description layer’ 
that collects the evidence of interactions between the learner and the Virtual World, (2) 
the ‘collector layer’, which is responsible for processing the data sent by the description 
layer, (3) the ‘storage layer’ that is where the data processed is stored, (4) the ‘analysis 
layer’, which analyses data and also maps the information inside a database (several 
statistical procedures and data mining methods are executed in this layer), and (5) the 
‘presentation layer’, which is responsible for the presentation of information to ?nal 
users or other applications integrated with this architecture. The main particularity of 
their approach is that the learner is at the centre of the architecture since their initial 
interactions are analysed through their ?ve-layer framework, which leads, in turn, to 
take actions to improve the Virtual World. 
In their work, [25] identi?ed and validated learners’ behaviour and patterns with the 
intention to avoid or reduce student defections in virtual courses. They o?ered insights 
about the advantages of the structures, contents and interactions on Virtual Worlds when 
compared with other types of Virtual Environments. After a class, both teachers and 
students evaluate aspects such acceptance and relevance through surveys. Their 
responses will lead to execute actions to avoid students’ defections and improve the 
adoption of the Virtual World. The researchers pointed the importance of two aspects 
of Virtual Worlds used in education for gain learners’ acceptance: (1) the versatility on 
interaction with other users o?ered by Virtual Worlds (e.g. gestures, text chat, voice 
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1125
chat), and (2) the freedom of movement across an open world (i.e. displacements 
between virtual islands or virtual lands), which facilitate learners to ?nd places adapted 
to their preferences. 
In their article, [30] report the development of a methodology for studying the 
behaviour of users with autism through a Virtual World. For collecting data, the authors 
de?ned a three-level scheme to analyse reciprocal interaction, which consists of: (1) a 
?rst ‘interaction mode level’ that describes reciprocal interactions (i.e. initiations, 
responses and continuation of activities and tasks) with focus on the social interactions 
among participants, (2) a second ‘interaction mode level’ that considers aspects such as 
the duration of the activities, or learners’ patterns in social activities (i.e. verbalization, 
text messages or avatar gestures), and (3) a ‘context level’ that describes learners’ 
engagement and technological supports. The authors personalize the Virtual World 
based on the data collected from the platform and from the reactions of the faces and 
gestures captured by a camera. 
In [31], the author attempted to apply Learning Analytics methods for studying 
students with social behaviour disorders. They used a collaborative Virtual World named 
iSocial. The authors focused on exploring tools that would allow them to gain sense of 
the data collected from the Virtual World. Then, they focused on answering questions 
about how participants with social behaviour disorders use their avatars while follow an 
instruction. Data was collected in two forms: (1) by recording the movements and posi- 
tions of the avatars, and (2) by ?lming the movements and gestures of students in the 
real world, synchronizing them with the actions captured from the Virtual World. The 
main contribution of this research resides on how the authors used data visualization 
techniques to understand individual students’ behaviour in the Virtual World since each 
student was considered a special case. 
In their work, [32] report their experiences studying a virtual o?ce conceived for 
teaching aspects about information security. The Virtual World was implemented in 
Second Life. The aim of the Virtual World was to study the impacts on achievement of 
learning outcomes though constructivist learning. The authors customized the learning 
process for two groups of students: a control group and an experimental group. Authors 
use the experimental group for introducing and testing improvements on the Virtual 
World and evaluate the results. Later, they analyse which of the changes lead to situa- 
tions where the students of the experimental group showed better perceived learning 
achievements that the students of the control group. This trial and error process allows 
testing learning strategies and uses only those that proved as e?ective for a student or a 
group of them. 
In [33], authors present a predictive student action model for Virtual Worlds used in 
education. Using this model, it is possible to predict common behaviours from students 
by analysing sequences of common mistakes. The authors took data from error logs and 
clustering it while they observe the time in which errors occur until students achieve the 
entire practice. Then each de?ned cluster is represented by an ‘automata’ that will be 
used for generating typologies of students. The authors implement their methods on 
what they called the Student Behaviour Predictor, which has mainly been used to predict 
the most probable future action based on the last action. This kind of analysis would 
allow personalizing the learning process based on actions of each student. The model 
1126 F. Gavilanes-Sagnay et al.
proposed by these authors will help to students to execute actions and ful?l learning 
objectives using predictive methods. 
In [34], the author describes the evolution of computer tools in the transition from 
e-learning to v-learning. They report the opportunities that the newcomer provides, 
specialty in public higher education. In his study, the researcher analyses some factors 
(e.g. motivation) on younger students while they visit a 3D virtual library on Second 
Life. A description of the main tools focused to adapt the transition from e-learning to 
v-learning is also o?ered. The author highlights psychological implications of learner’s 
experience on Virtual Worlds for future studies. 
2.2 Adaptive Learning 
This approach focuses on automatically adapting learning design, learning process, and 
methodologies according to the cognitive schemes of students or by the identi?cation 
of areas where they have di?culties [35]. The customizations come as the result of 
analysing the data that is captured while students follow a course, just like Personalized 
Learning. However, even when Personalized Learning and Adaptive Learning look 
similar, they are not the same. While Personalized Learning refers to customizations by 
an instructor, Adaptive Learning refers to techniques that allow the monitoring of student 
´s progress and the modi?cation of instructions in real time. In our review, we only found 
a single contribution that analyses and use data in real time. 
In [36], authors propose a framework for the use of Virtual Worlds in education 
focused on the identi?cation of learning ?ows and the veri?cation of student´s satisfac- 
tion through process mining techniques. Their framework has a core based in a Virtual 
World platform known as OPENET4EVE. The authors propose a feature to model 
learning processes in Virtual Worlds that can monitor and register the events generated 
by students and teachers. Then, they use a Process Miner System to study a real ?ow of 
information in a course. These adaptations can generate a new structure of the learning 
process or even a new learning strategy that can be exploited on other case studies. 
2.3 Educational Intervention 
This approach is a useful instrument to reduce the student failure and promote compe- 
tency-based learning. The aim is to in?uence the skills development of a learner to ensure 
his/her successful training and education [37]. It allows obtaining predictions about the 
attitude and behaviour that the student would adopt when confronted to a speci?c 
content, an evaluation or group works. Once again, we were able to ?nd only one contri- 
bution that ?ts in this category. 
In their work, [38] explore the scope of Virtual Worlds and adopt a typology for 
virtual communities based on the ?ve forces of Michel Porter [39]. For each community, 
they described ?ve elements: purpose, place, platforms, population, and pro?t model. 
The authors selected Second Life as a representative case study for applying two surveys 
and analysing results. At last, they provide guidelines for the implementations of future 
Virtual Worlds centred on Education, Social Sciences and Humanities. The authors used 
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1127
the ?ve forces of Porter in order to propose adaptations to Virtual Worlds for providing 
learners with the skills needed to success on their courses. 
3 Solving Issues Concerning the Use of Virtual Worlds Through 
Learning Analytics 
In Table 1, we summarize how each of the contributions described above bring solutions 
for the most common issues on the use of Virtual Worlds in education. 
Table 1. Problems related to the use of Virtual Worlds in Educations 
No. Problem Personalized 
learning 
Adaptive 
learning 
Educational 
intervention 
1 Identifying con?ictive user 
behaviours 
[7, 25, 30–34] [36] 
2 Track the students’ interactions with 
elements of the Virtual World 
[7, 25, 30–34] [36] [38] 
3 Lack of indicators for following up 
the progress of the students in the 
courses 
[7, 30] [36] 
4 Lack of implementation of well-de?ned 
evaluation parameters 
[38] 
5 Di?culties for evaluating the 
individual and collective 
contributions while the students 
handle tasks 
[7, 30–34] 
6 Di?culty of keeping students 
engaged and motivated 
[25, 30, 31, 34] 
7 A teachers’ very time-consuming 
supervision in the search for signs of 
doubt, frustration, stress or fatigue 
from students 
[7, 32] 
8 Pedagogical issues that are inherent 
to conventional learning 
[7, 34] 
9 Absence of virtual tutors for guiding 
the learning process 
[7, 25, 30–32, 34] [36] 
Concerning Personalized Learning, we can appreciate in Table 1 that most of the 
contributions o?er solutions for: identifying con?ictive user behaviours, tracking 
students’ interactions with elements of the Virtual World, evaluating individual and 
collective contributions while the students handle tasks, keeping students engaged and 
motivated, and for the absence of virtual tutors for guiding the learning process. These 
results are not surprising since Learning Analytics has proven to be very useful for 
dealing with these problems in Virtual Environments. Additionally, the problems listed 
above are operational in nature and they refer to situations where technological contri- 
butions are easier to implement and evaluate. Conversely, more complex problems (i.e. 
1128 F. Gavilanes-Sagnay et al.
lack of implementation of well-de?ned evaluation parameters, teachers’ very time-consuming 
supervision, and pedagogical issues that are inherent to conventional 
learning) have received less attention. Dealing with such issues demand a higher 
abstraction level that demands the right construction of indicators for supporting peda- 
gogical decisional process. Nonetheless, some customizations for dealing with these 
problems have been implemented based on the analysis of data. 
Contributions bringing solutions for Adaptive Learning where, by far, fewer than 
those for Personalized Learning. The unique contribution that uses Learning Analytics 
deal with several issues: identifying con?ictive user behaviours, tracking students’ 
interactions with elements of the Virtual World, following up the progress of the students 
in the courses, and the absence of virtual tutors for guiding the learning process. 
Conversely, the problems in where exists absence are: lacking well-de?ned evaluation 
parameters, di?culties for evaluating the individual and collective contributions while 
the students handle tasks, di?culties of keeping students engaged and motivated, 
teachers’ very time-consuming supervision, and pedagogical issues that are inherent to 
conventional learning. Once again, the attention rest in the champ of operational deci- 
sions. However, in this case, it is not surprising since more complex decisions cannot 
be taken in a real-time fashion. 
About Educational Intervention, contributions are also scarce. As it can be appreci- 
ated in Table 1, the contribution identi?ed in this category deals with only two issues: 
track students’ interactions with elements of the Virtual World and implementation of 
well-de?ned evaluation parameters. This is not surprising since competence-based 
learning demands complex analysis of data that should respond to the information needs 
of pedagogical experts. 
Regarding the platforms used for implementing the Virtual Worlds, which were later 
supported by Learning Analytics mechanisms, most of the studies used well-established 
platforms for hosting virtual worlds (Table 2). Second Life was the most used platform 
by the contributions retained in our review. Open Wonderland [40, 41] and Open Simu- 
lator [36], both distributed under Open Source licences, were also preferred by 
researchers. The latter two platforms also o?er ?exibility for implementing monitors 
that collect data. The remaining contributions developed their own virtual worlds using 
game development platforms as Unity. 
Table 2. Platforms used on retained studies 
3DVLE Contributions using the platform 
Second Life [7, 32, 34, 38] 
Open Wonderland [30, 31] 
Open Simulator [25, 33] 
OPENET4EVE [36] 
4 Discussion and Conclusions 
Learning Analytics is a powerful tool for improving the use of Virtual Worlds in educa- 
tion. Our review shows that most of the contributions on this ?eld yields on support for 
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1129
Personalized Learning. That means most of the research were centred on supporting 
decisions whose nature falls on operational non-real time tasks (e.g. identifying con?ic- 
tive user behaviours, track the students’ interactions with virtual elements, following up 
the progress of the students, absence of virtual tutors for guiding the learning process). 
On the other hand, the problems in where exists total or relative absence of treatment 
were more complex and strategical issues: implementation of well-de?ned evaluation 
parameters, evaluation of individual and collective contributions, keep engagement and 
motivation, reducing supervision time, pedagogical issues that are inherent to conven- 
tional learning. Therefore, research opportunities are open in the ?eld of Learning 
Analytics for supporting decision-making of teachers and pedagogical authorities 
concerning ‘strategical’ decisions about contents, pedagogical design, linearity of the 
learning process, Virtual World design, interfaces, evaluation mechanisms, teamwork, 
interactions among users, etc. 
Surprisingly, few of the contributions can be classi?ed in the camp of Adaptive 
Learning. A research opportunity rises on the development of models for automatic 
decisions based on real-time data recovered from Virtual Worlds used in education. An 
opportunity is also o?ered for contributions on the ?eld of Educational Intervention, 
where the identi?cation of relevant indicators for developing competences and reducing 
student de?ection is needed. 
Neither of the research reviewed has contributed to the development and application 
of a framework for dealing with decisions concerning needs of the decision makers of 
these courses, neither at operational nor strategical level. Instead, cited contributions 
focus on few aspects of the relationship with decision-making but without following a 
holistic approach. Even worse, none have even reported the results of asking teachers 
or pedagogical authorities about their needs in terms of information needs. 
References 
1. Saracevic, M.: Concept and types of virtual environments: research about positive impact on 
teaching and learning. UNITE: University Journal of Information Technology and Economics 
1(1), 51–57 (2014) 
2. Letouze, P., Prata, D., Barcelos, A., Barbosa, G., Franc, G., Rocha, M.: Is technology 
management education a requirement for a virtual learning environment? In: Technology & 
Engineering Management Conference, pp. 404–408 (2017) 
3. Milkova, E., Slaby, A.: E-learning as a powerful support of education at universities. In: 28th 
International Conference on Information Technology Interfaces, pp. 83–88 (2006) 
4. Drachen, A., Sifa, R., Thurau, C.: The name in the game: patterns in character names and 
gamer tags. Entertain. Comput. 5(1), 21–32 (2014) 
5. Lan, Y., Hsu, T.: Guest editors’ introduction: special issue “ICT in language learning”. Res. 
Pract. Technol. Enhanc. Learn. 10(1), 21 (2015) 
6. Kumar, S., Daniel, B.: Integration of learning technologies into teaching within Fijian 
Polytechnic Institutions. Int. J. Educ. Technol. High. Educ. 13(1), 36 (2016) 
7. Cruz-Benito, J., Therón, R., García-Peñalvo, F., Maderuelo, C., Pérez-Blanco, J., Zazo, H., 
et al. Monitoring and feedback of learning processes in virtual worlds through analytics 
architectures: a real case. In: 9th Iberian Conference on Information Systems and 
Technologies (CISTI), pp. 1–6 (2014) 
1130 F. Gavilanes-Sagnay et al.
8. Virvou, M., Katsionis, G., Manos, K.: Combining software games with education: evaluation 
of its educational e?ectiveness. J. Educ. Technol. Soc. 8, 54–65 (2005). International Forum 
of Educational Technology & Society 
9. Bremer, P., Weber, G., Tierny, J., Pascucci, V., Day, M., Bell, J..: Interactive exploration and 
analysis of large-scale simulations using topology-based data segmentation. In: IEEE 
Transactions on Visualization and Computer Graphics, pp. 1307–1324 (2011) 
10. Wojciechowski, R., Cellary, W.: Evaluation of learners’ attitude toward learning in ARIES 
augmented reality environments. Comput. Educ. 68, 570–85 (2013) 
11. Williams, D.: The mapping principle, and a research framework for virtual worlds. Commun. 
Theory 20(4), 451–470 (2010) 
12. Oliveira, F., Santos, S.: PBLMaestro: a virtual learning environment for the implementation 
of problem-based learning approach in computer education. In: 2016 IEEE Frontiers in 
Education Conference, pp. 1–9 (2016) 
13. Bandura, A.: Perceived self-e?cacy in cognitive development and functioning. Educ. 
Psychol. 28(2), 117–148 (1993) 
14. Hmelo-Silver, C.: Problem-based learning: what and how do students learn? Educ. Psychol. 
Rev. 15(3), 22–30 (2004) 
15. Goncalves, S., Carneiro, D., Alfonso, J., Fdez-Riverola, F., Novais, P.: Analysis of student’s 
context in e-Learning. In: 2014 International Symposium on Computers in Education, pp. 
179–182 (2014) 
16. Panchoo, S.: Learning space: assessment of prescribed activities of online learners. In: 2017 
International Conference on Platform Technology and Service, pp. 1–4 (2017) 
17. Boojihawon, D., Gatsha, G.: Using ODL and ICT to develop the skills of the unreached: a 
contribution to the ADEA triennial of the Working Group on Distance Education and Open 
Learning, pp. 12–17 (2012) 
18. Abrami, P., Bernard, R., Wade, A., Schmid, R., Borokhovski, E., Tamin, R., Newman, S.: A 
review of e-learning in Canada: a rough sketch of the evidence, gaps and promising directions. 
Can. J. Learn. Technol./La revue canadienne de l’apprentissage et de la technologie 32(3) 
(2008) 
19. Carmody, K., Zane, B.: Existential elements of the online learning experience. Int. J. Educ. 
Dev. Using ICT 1(3), 108–119 (2005) 
20. Panayides, M.: The impact of organizational learning on relationship orientation, logistics 
service e?ectiveness and performance. Ind. Mark. Manag. 36(1), 68–80 (2007) 
21. Sungkur, R., Santally, M., Peerun, S., Foo, R., Wu, Y., Wah, T., et al.: True sight learning-an 
innovative tool for learning analytics. In: IEEE International Conference, Emerging 
Technologies and Innovative Business Practices for the Transformation of Societies, pp. 235– 
240 (2016) 
22. Einhardt, L., Tavares, T., Cechinel C.: Moodle analytics dashboard: a learning analytics tool 
to visualize users interactions in moodle. In: Proceedings - 2016 11th Latin American 
Conference on Learning Objects and Technology, pp. 1–6 (2016) 
23. Gros, B.: The design of smart educational environments. Smart Learn. Environ. 3(15), 1–11 
(2016) 
24. Johnson, J., Shum, S., Willis, A., Bishop, S., Zamenopoulos, T., Swithenby, S., Bourgine, P.: 
The FuturICT education accelerator. Eur. Phys. J. Spec. Top. 214(1), 215–243 (2012) 
25. Cruz-Benito, J., Therón, R., García-Peñalvo, F., Lucas, E.: Discovering usage behaviors and 
engagement in an Educational Virtual World. Comput. Hum. Behav. 47(1), 18–25 (2015) 
26. Simon, H.: A mechanism for social selection and successful altruism. Science 250(4988), 
1665–1668 (1990) 
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1131
27. Budgen, D., Brereton, P.: Performing systematic literature reviews in software engineering. 
In: Proceeding of the 28th International Conference on Software Engineering, p. 1051 (2006) 
28. Sclater, N.: Learning Analytics Explained, 1st edn. Taylor & Francis, London (2017) 
29. Hwang, G.: De?nition, framework and research issues of smart learning environments - a 
context-aware ubiquitous learning perspective. Smart Learn. Environ. 1(1), 4 (2014) 
30. Schmidt, M., La?ey, J., Schmidt, C., Wang, X., Stichter, J.: Developing methods for 
understanding social behavior in a 3D virtual learning environment. Comput. Hum. Behav. 
28(2), 405–413 (2012) 
31. Schmidt, M., La?ey, J.: Visualizing behavioral data from a 3D virtual learning environment: 
a preliminary study. In: 45th Hawaii International Conference on System Sciences, pp. 3387– 
3394 (2012) 
32. Chau, M., Wong, A., Wang, M., Lai, S., Chan, K., Li, T., et al.: Using 3D virtual environments 
to facilitate students in constructivist learning. Decis. Support Syst. 56(1), 115–121 (2013) 
33. Riofrio-Luzcando, D., Ramírez, J.: Predictive student action model for procedural training in 
3D virtual environments. Intell. Tutoring Syst. Struct. Appl. Chall. 1(1), 1–2 (2016) 
34. Tick, A.: A new direction in the learning processes, the road from eLearning to vLearning. 
In: 6th IEEE International Symposium on Applied Computational Intelligence and 
Informatics, pp. 359–362 (2011) 
35. Ro, T., Bari, B.: Adaptive e-learning environments: research dimensions and technological 
approaches. Int. J. Distance Educ. Technol. 11(3), 1–11 (2013) 
36. Fernández-Gallego, B., Lama, M., Vidal, J., Mucientes, M.: Learning analytics framework 
for educational virtual worlds. Procedia Comput. Sci. 25(1), 443–447 (2013) 
37. Atkisson, M., Wiley, D.: Learning analytics as interpretive practice. In: Proceedings of the 
1st International Conference on Learning Analytics and Knowledge vol. 1, no. (1), p. 117 
(2011) 
38. Messinger, P., Stroulia, E., Lyons, K., Bone, M., Niu, R., Smirnov, K., et al.: Virtual worlds 
- past, present, and future: New directions in social computing. Decis. Support Syst. 47(3), 
204–228 (2009) 
39. Porter, C.: A typology of virtual communities: a multi-disciplinary foundation for future 
research. J. Comput. Mediat. Commun. 10(1) (2004) 
40. Kaplan, J., Yankelovich, N.: Open wonderland: an extensible virtual world architecture. IEEE 
Internet Comput. 15(5), 38–45 (2011) 
41. Allison, C., Campbell, A., Davies, C., Dow, L., Kennedy, S., McCa?ery, J., et al.: Growing 
the use of Virtual Worlds in education: an OpenSim perspective. In: Proceedings of the 2nd 
European Immersive Education Summit (2012) 
1132 F. Gavilanes-Sagnay et al.
Design and Evaluation of an Online Digital 
Storytelling Course for Seniors 
David Kaufman(?) , Diogo Silva, Robyn Schell, and Simone Hausknecht 
Simon Fraser University, Burnaby, BC V5A1S6, Canada 
dkaufman@sfu.ca 
Abstract. Purpose. The purpose of this proposed project was to develop and 
evaluate an online version of a digital storytelling course delivered through the 
university’s Canvas learning platform. Background. In digital storytelling, partic- 
ipants write their personal stories in a clear and linear structure, and then create 
short movies using relatively simple video editing software. This provides an 
opportunity to share life lessons, leave a legacy, and engage socially with their 
peers. Method. We adapted the content and activities from the earlier face-to-face 
course into weekly online modules. The target audience comprised 15 older adults 
between 60 and 75 years old. A Research Assistant (RA) provided online assis- 
tance when requested using Skype. A qualitative approach was employed to 
collect data, including a demographic questionnaire, module questionnaires, a 
course evaluation survey near the end, and individual interviews. Results. The 
?ndings of our evaluation showed that 9 of the 15 participants were able to 
complete the online course in varying timeframes. Participants’ feedback was 
very positive and all participants who completed the course reported that they 
would recommend it to a friend. Conclusion. Two key suggestions emerged for 
improving the course. First, make the time and workload requirements clear 
during the recruitment process. Second, investigate ways for reducing the time 
required to complete the course in future o?erings. Despite these suggestions, the 
results appear to provide support for o?ering the digital storytelling online course 
to a wider audience of older adults. 
Keywords: Digital storytelling · Online course · Seniors 
1 Background 
Digital storytelling is a multimedia experience that layers narrative, music, sound, 
visuals, and sometimes ?lm segments into a short video artifact. To create a digital story, 
storytellers re?ect upon their past, write a script, and then add the media pieces. These 
pieces are tied together through the use of digital technology. This requires that the 
learner re?ects on a personal story and incorporates media to portray mood and meaning 
[1]. The process involved in creating a digital story requires learners to engage with 
multimedia technology that may increase visual and digital literacy, while also requiring 
consideration on the meaning of the various media elements being used [2]. Furthermore, 
the ?nished product can be shared with a large or small audience, including family and 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1133–1141, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_84
friends. With the creation of the digital story, the learner is not only learning about 
technology as a consumer, but also as a producer [3]. According to the founder of the 
StoryCenter, digital storytelling is de?ned as a process for “gathering of personal stories 
into short little nuggets of media called digital stories” (p. 1) [4]. Digital stories are 
meaningful and powerful artifacts of modern expression because they integrate narra- 
tive, images, music, which provides depth to characters, situations, experiences, and 
insights [5]. Digital stories are generally three to ?ve-minute ?lms that combine images, 
text, narration, and music. The images are often personal photos from meaningful events 
in the person’s life. The script is created by the storyteller to be incorporated into the 
?lm using a recorded voiceover [6]. 
Digital storytelling is used in a variety of ?elds including education when used by 
teachers as a powerful tool in 21st century classrooms [7]. Digital storytelling is also 
used in cultural studies as a method for creative practices and is used as an additional 
resource for oral and historical archives [8]. Digital storytelling is also implemented 
globally as a resource for marginalized populations to share their stories [9]. Several 
bene?ts of digital storytelling used in education include improved literacy skills, 
promoting 21st century skills, and engaging students and teachers [7]. Some other bene- 
?ts of digital storytelling include improved self-e?cacy and adoption of educational 
technologies [10]. 
Digital storytelling has also been used to increase adoption of new technologies since 
once the project is complete, participants may have an increased sense of self-e?cacy 
[10]. In a study by Heo [10] with 98 pre-service teachers using a quasi-experimental 
design, it was found that after completing a digital storytelling assignment, students 
increased both their e?cacy rate and their attitudes towards using technology with other 
learners in the classroom. Regarding older adult learners, increasing technology e?cacy 
can also be important to adoption. The current study aimed to examine whether older 
adults reported an increase in digital literacy skills after working through a digital story- 
telling project in a 10-week course. 
Several bene?ts of this digital storytelling course curriculum were found previously 
for older adults [11, 12]. These bene?ts include: empowered participants; assisted social 
connections with course participants, friends, and family; provides a means for legacy 
creation; increased digital storytelling, technology, and internet skills; provides an 
opportunity to share stories with others and to learn something new [11, 12]. 
Storytelling, both digital and traditional, is a social activity that involves a high level 
of communication. Previous studies have shown that a lack of communication and social 
connection to others can contribute to isolation and loneliness [13, 14], which in turn 
can result in problems such as depression and cognitive decline for older adults. It is 
important to provide opportunities where older adults can share their experiences, make 
connections, and build relationships with others in a positive and supportive social 
environment. 
The creative process used in digital storytelling can provide older adults with the 
means to capture and re?ect on memories and lived experiences. The ‘wisdom’ accu- 
mulated through their lives is often valuable to them and also may be to others. 
Digital stories can be shared publicly by uploading them to the Internet, saving them 
on a digital media device such as a ?ash drive, or showing them to others in public 
1134 D. Kaufman et al.
events. With the use of new media, digital stories may allow for an increase in the 
everyday voices of elders to be heard [8]. A digital storytelling course also can be viewed 
as a lifelong learning experience. Many of the positive aspects attributed to lifelong 
learning have been found to lead to an increase in the well-being of older adults [15]. 
Previous research on storytelling and older adults has examined the e?ect of recalling 
lived experiences and sharing them through autobiographical narratives and reminis- 
cence [16, 17] suggest that sharing autobiographical narratives can have several positive 
e?ects for older adults such as increased self-esteem, a stronger identity, and ?nding 
increased meaning in their lives. A review by Bohlmeijer et al. [16] on reminiscence 
research, a process of recalling events in a persons’ life, found that reminiscing had a 
moderate e?ect on life-satisfaction and well-being. In another example, Meléndez Moral 
et al. [18] conducted research on integrative reminiscence. In this style of reminiscence, 
participants recall events and try to integrate past and present to form meaning. Results 
suggested that reminiscence led to positive outcomes of increased self-esteem, life inte- 
gration, life satisfaction, psychological well-being and reduced depression. 
Older adults often express an interest in sharing their life experiences with their 
family and society. Sharing their stories can be considered as an act of leaving a life 
legacy, so that older adults’ family and other individuals may learn through their life 
experiences [17]. Older adults may feel that leaving a legacy is a way to keep their 
“presence” alive, even after death [19]. Older adults may also wish to share their many 
life experiences with a wider audience as these stories may resonate with others outside 
their respective families. Digital media provides a way to store, preserve, and share the 
digital legacies of older adults [20]. 
The events within a person’s life are often examined and re-examined for meaning 
at di?erent times across the life course. Various researchers have suggested that this 
meaning making process contributes to a person’s identity [21–23]. A person will 
examine their past experiences to better understand how they became their current self 
[21]. New experiences and themes must be evaluated against previous life stories; there- 
fore, “the life story itself develops in terms of its content and themes” (p. 86). McLean, 
Pasupathi and Pals [23] suggest that situated stories, those created and told within a 
speci?c situation that have a speci?c purpose and audience, may a?ect the development 
of self throughout the lifespan. Life stories, similar to ideas of cognitive dissonance, are 
most impactful on self-identity when they are about a challenge or disruption [23]. 
1.1 Outline of the Digital Storytelling Course 
The course is intended to give participants an opportunity to explore their life stories 
and create a digital artefact, so they could easily share a piece of wisdom or a legacy 
story from their life with course participants and others. Storytelling is not done in 
isolation, as stories are created to be shared with others. Thus, this course was not 
intended to be simply an isolated activity, but the stories were shared at the end in a 
“sharing our stories” event within the community. Informed by StoryCenter (previously 
The Center for Digital Storytelling), the Digital Storytelling Cookbook [24], and creative 
writing and ?lm techniques, the digital storytelling course for older adults was designed 
with two separate, yet integrated, phases: story creation and digital production. This 
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1135
enabled su?cient time for deciding on and then writing a solid story before incorporating 
the technology. Similarly, Ohler [25] suggested that the ?rst task in digital storytelling 
is to teach learners how to be storytellers, and the second is to use multimedia to enhance 
their story. Additionally, the course was designed to create as many collaborative expe- 
riences as possible to enhance community, social connection, sharing, and knowledge 
construction; however, each participant worked on their own digital story. 
The authors designed the course outline, unit plan, weekly lesson plans, and weekly 
handouts which were used by all facilitators to ensure the content and delivery were 
standardized. During the course, participants learned about story creation and were 
provided with numerous opportunities to share ideas and drafts of their stories. 
Following story creation, participants digitized their work by combining voice, images, 
music, and sounds to illustrate their narrative (Table 1). Although there were many 
opportunities to exchange stories and share understanding of each other’s life histories, 
social opportunities became more limited as participants spent increased time at their 
computers focused on their individual stories. The course consists of an outline module, 
nine activity modules (Weeks 1 to 9) and a ?nal module for participants to comment on 
each other’s digital video productions (Week 10). 
Table 1. Outline of the online digital storytelling course 
Week 1 Introducing the course (and evaluation study) 
Week 2 Introducing WeVideo, and practice creating a verbal story 
Week 3 Writing a script (draft) 
Week 4 Sharing the story with peers and revising the script 
Week 5 Finding/preparing images and creating a storyboard 
Week 6 Recording the narrative in own voice and adding sound/music 
Week 7 Editing images and narrative 
Week 8/9 Editing and adding ?nal touches 
Week 10 Publishing and sharing ?nal digital story & feedback to peers 
WeVideo, a browser-based digital storytelling software, was chosen as it is browser-based 
and thus allows for access on both Windows-based and Apple computers on the 
Internet. Thus, it was expected that participants would spend some time working on their 
stories outside of the course. 
After o?ering this course face-to-face several dozen times, we created an online 
version to be delivered through the university Learning Management System (LMS) 
called Canvas. The length of time to complete the course and create a digital story 
generally depends on the storyteller’s commitment to the project, their technology skills 
and the complexity of the story. We estimated that it would take between 25–30 h, with 
some materials given prior to the workshop to assist participants in preparation (e.g., 
suggestions for script writing, image selection). We expected that by the end of the 
course, participants would be able to create a three to six-minute digital story. This 
timeline needed to be somewhat ?exible to provide a successful and enjoyable experi- 
ence for the participants. 
1136 D. Kaufman et al.
2 Evaluation Method 
2.1 Evaluation Questions 
1. What are the participants’ perceptions and opinions of the learning design? 
2. What are the participants’ perceptions and opinions of their learning experience? 
2.2 Evaluation Approach 
The target audience comprised 15 older adults between 60 and 75 years old recruited 
through the university’s seniors program and several long-term care facilities. Nine 
completed the course and participated in the evaluation study. A qualitative approach 
was used for this research and development project. The method employed self-report 
questionnaires at the end of each module. Each participant provided data regarding the 
quality, clarity and e?cacy of the course design and material provided to attain the goals 
of each module, and to complete the tasks proposed. In order to capture the participants’ 
perceptions regarding speci?c details, the same questionnaire was presented by the end 
of each module. In addition, there was a second set of questions for the sole purpose of 
evaluating the instructional videos. At the end of the course, we asked for an overall 
course evaluation, and followed up with individual interviews to understand the written 
evaluations. 
2.3 Evaluation Procedures 
The following procedures describes the major elements of our research study. 
• Designed participant surveys, interview guide, and rubrics to analyze stories. 
• Actively recruited participants through current contacts as well as new ones. 
• Delivered the online course and collected data. 
• Conducted surveys online after each module. 
• Administered online survey at conclusion of course. 
• Interviewed the nine participants that ?nished at the conclusion of the course via 
skype and telephone. 
• Analyzed data continuously and adapt the interventions, if required. 
• Uploaded participants’ stories on the university vault to ensure security. 
2.4 Evaluation Instruments 
The course evaluation relied on a pre-questionnaire, module questionnaires, course 
evaluation, and interview guide. The evaluation instruments were applied as follows: 
• Pre-questionnaire (end of the course outline Week 1) 
This questionnaire was used to collect information on participants’ age, gender, 
computer literacy and usage frequency, video editing software skills, and if they had 
taken online courses before. 
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1137
• Course evaluation (end of Week 10) 
The course evaluation comprised a Likert scale assessing the helpfulness of the 
course facilitator, the process used to guide participants in writing their personal story, 
the quality of the video editing software, the course’s level of di?culty, participants’ 
level of satisfaction with the course, and whether or not participants would recommend 
the course to a friend. It also contained open-ended questions about what participants 
liked the most and the least on the module, what could be changed, and if they would 
like to add any comments. 
• Interview questionnaire guide (end of the course) 
The interview guide comprised eight open-ended questions used for an audio 
recorded interview via Skype software. Participants were asked to talk about the 3 
following issues: (1) their experience in taking the online course, (2) what they liked 
most and least about it, (3) if the written instructions and instructional videos clearly 
explained the expected outcomes of the activities, (4) whether this material had been 
enough to instruct them throughout the modules, and (5) whether the absence of a person 
explaining face-to-face made them feel insecure. It also asked participants what could 
be changed, and if they would like to add any comments. 
3 Data Analysis 
We coded and analyzed the data for the questions that included quantitative responses 
using Excel as a tool since the number of our participants was small. Using a content 
analysis framework, we developed a codes and categories to help us identify broad 
themes based on our data analysis. 
4 Preliminary Evaluation Results 
The preliminary results showed consistency in both the pro?le of participants and their 
questionnaire and interview responses. The results reported here are from the nine 
participants who concluded the course, consisting of three in the pilot test phase and six 
in the ?eld test phase. 
The pilot test phase was designed to ?nd possible ?aws in the design and the three 
participants were personal acquaintances of the researchers. The ?eld test phase 
recruited a total of 13 participants, and seven dropped out in the ?rst module. Of the 
seven dropouts, four provided feedback as to why they did not move past the activities 
of the Week 1 module, all claiming that the volume of work proposed in each module 
did not match their time schedules. This showed that, even after retirement, these partic- 
ipants were busy with either part-time jobs or social activities. 
1138 D. Kaufman et al.
4.1 Pre-questionnaire 
Most participants were women with ages ranging from 65 to 74 years. The only two age 
exceptions were one participant who checked the age box of 75 to 79. In terms of 
computer usage and skills, all nine participants claimed to use a computer daily, with 
six of them having very good skills. Only two had taken an online course before. 
4.2 Course Evaluation and Interviews 
Of the nine participants, eight rated the process used to guide them in writing their own 
story as good or very good, with only one rating it as fair. All nine participants rated 
WeVideo as good or very good, and all of them were satis?ed or very satis?ed with the 
course and would recommend it to a friend. When rating the course’s level of di?culty, 
six considered it to be easy or just right, while three considered it to be di?cult. 
The two repeating points that participants liked least about the course were its length, 
which con?icted with their personal schedules, and the lack of constant forum partici- 
pation. The participants who complained about the latter suggested guidelines for how 
the forum discussions should take place, in order to create more interaction, as well as 
interaction with the facilitator instead of individual emails. 
During the interviews, the three main factors brought up by participants were the 
strong notion of having created a legacy that re?ects their family values, the feeling of 
agency for possessing the knowledge to produce a product that de?nes their own image, 
and the chance to discover a new group of people who shared the positive experience. 
It is interesting to note that the completion rate for the course was similar to regular 
university-based online courses, even though participants were seniors and this was a 
non-credit course. 
5 Conclusion 
Two key suggestions emerged for improving the course. First, make the time and work- 
load requirements clear during the recruitment process. Second, investigate ways for 
reducing the time required to complete the course in future o?erings. Despite these 
suggestions, all participants who completed the course reported that they would recom- 
mend it to a friend. This pilot and ?eld test appears to provide support for o?ering the 
online course to a wider audience in order to make digital storytelling accessible to more 
older adults. 
Acknowledgments. We wish to thank the Social Sciences and Humanities Research Council of 
Canada (SSHRC) and the AGE-WELL National Centre of Excellence Network for ?nancial 
support of this project. 
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1139
References 
1. Hausknecht, S., Vanchu-Orosco, M., Kaufman, D.: Digitising the wisdom of our elders: 
connectedness through digital storytelling. Published online on 17 July, pp. 1–21 (2018) 
2. Jakes, D.S., Brennan, J.: Digital Storytelling, visual literacy and 21st century skills. In: Online 
Proceedings of the Tech Forum New York (2005) 
3. Miller, P.J., Cho, G.E., Bracey, J.R.: Working-class children’s experience through the prism 
of personal storytelling. Hum. Dev. 48(3), 115–135 (2005) 
4. Lambert, J.: Digital Storytelling: Capturing Lives, Creating Community. Digital Diner Press, 
Berkeley (2006) 
5. Rule, L.: Digital storytelling: never has storytelling been so easy or so powerful. Knowledge. 
Quest, 38, 4, 56 (2010) 
6. Stenhouse, R., Tait, J., Hardy, P., Sumner, T.: Dangling conversations: re?ections on the 
process of creating digital stories during a workshop with people with early-stage dementia. 
J. Psychiatr. Ment. Health Nurs. 20(2), 134–141 (2013) 
7. Robin, B.R.: Digital storytelling: a powerful technology tool for the 21st century classroom. 
Theory Pract. 47(3), 220–228 (2008) 
8. Burgess, J.: Hearing ordinary voices: cultural studies, vernacular creativity and digital 
storytelling. Continuum 20(2), 201–214 (2006) 
9. Sawhney, N.: Voices beyond walls: the role of digital storytelling for empowering 
marginalized youth in refugee camps. In: Proceedings of the 8th International Conference on 
Interaction Design and Children, pp. 302–305. ACM (2009) 
10. Heo, M.: Digital storytelling: an empirical study of the impact of digital storytelling on pre-service 
teachers’ self-e?cacy and dispositions towards educational technology. J. Educ. 
Multimed. Hypermedia 18(4), 405 (2009) 
11. Hausknecht, S., Schell, R., Zhang, F., Kaufman, D.: Older adults’ digital gameplay: a follow-up 
study of social bene?ts. In: Information and Communication Technologies for Ageing 
Well and e-Health. Springer International Publishing, pp. 198–216 (2015) 
12. Hausknecht, S., Vanchu-Orosco, M., Kaufman, D.: Shaing life stories: design and evaluation 
of a digital storytelling workshop for older adults. In: Computer Supported Education: 8th 
International Conference, CSEDU 2016, Rome, Italy. Revised Selected Paper. Springer 
International Publishing (2016) 
13. Rook, K.S.: Social relationships as a source of companionship: Implications for older adults’ 
psychological well-being. In: Sarason, B.R., Sarason, I.G., Gregory, R.P. (eds.) Social 
Support: An Interactional View, pp. 219–250. Wiley, New York (1990) 
14. Boulton-Lewis, G.M., Buys, L., Lovie-Kitchin, J.: Learning and active aging. Educ. Gerontol. 
32(4), 271–282 (2006) 
15. Weinstein, L.B.: Lifelong learning bene?ts older adults. Act. Adapt. Aging 28(4), 1–12 (2004) 
16. Bohlmeijer, E., Roemer, M., Cuijpers, P., Smit, F.: The e?ects of reminiscence on 
psychological well-being in older adults: a meta-analysis. Aging Ment. Health 11(3), 291– 
300 (2007) 
17. Birren, J.E., Deutchman, D.E.: Guiding Autobiography Groups for Older Adults: Exploring 
the Fabric of Life. Johns Hopkins University Press (JHU Press), Baltimore (1991) 
18. Meléndez Moral, J.C., Fortuna Terrero, F.B., Sales Galán, A., Mayordomo Rodríguez, T.: 
E?ect of integrative reminiscence therapy on depression, well-being, integrity, self-esteem, 
and life satisfaction in older adults. J. Posit. Psychol. 10(3), 240–247 (2015) 
19. Wallace, J., Wright, P.C., McCarthy, J., Green, D.P., Thomas, J., Oliver, P.: A design-led 
inquiry into personhood in dementia. In: Conference Proceedings CHI 2013: Changing 
Perspectives, Paris, France (2013) 
1140 D. Kaufman et al.
20. Sherlock, A.: Larger than life: digital resurrection and the re-enchantment of society. Inf. Soc. 
29(3), 164–176 (2013) 
21. Pasupathi, M., Mansour, E., Brubaker, J.R.: Developing a life story: Constructing relations 
between self and experience in autobiographical narratives. Hum. Dev. 50(2–3), 85–110 
(2007) 
22. McAdams, D.P., McLean, K.C.: Narrative identity. Curr. Dir. Psychol. Sci. 22(3), 233–238 
(2013) 
23. McLean, K.C., Pasupathi, M., Pals, J.L.: Selves creating stories creating selves: a process 
model of self-development. Pers. Soc. Psychol. Rev. 11(3), 262–278 (2007) 
24. Lambert, J.: Digital Storytelling Cookbook. Berkeley, CA (2010) 
25. Ohler, J.: The world of digital storytelling. Educ. Lead. 63(4), 44–47 (2006) 
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1141
The Role of Self-e?cacy in Technology Acceptance 
Saleh Alharbi1,2(?) 
and Steve Drew3 
1 
Gri?th University, Gold Coast, Australia 
saleh@su.edu.au 
2 
Shaqra University, Riyadh, Saudi Arabia 
3 
Tasmanian Institute of Learning and Teaching, University of Tasmania, Hobart, Australia 
Abstract. In this paper, we propose a model that can be used to evaluate the role 
of individual di?erences in technology acceptance. More speci?cally, this study 
reviews the impact of self-e?cacy on technology acceptance by proposing a 
model that uses Technological Pedagogical and Content Knowledge (TPACK) 
to underpin self-e?cacy. The model explains the in?uence that self-e?cacy may 
have on perceived ease of use and perceived usefulness in the Technology 
Acceptance Model (TAM). This model can be integrated into future research 
concerning e-learning readiness with particular focus on educational settings. The 
adaptability of the model will enable researchers to introduce additional factors 
speci?c to their study context and the research design being employed. 
Keywords: Individual di?erences · Technology acceptance · Higher education 
1 Introduction 
Individual di?erences is a term used to describe the variations between individuals that 
determine the ability to successfully achieve desired results [1]. Traits, personal circum- 
stances and characteristics, perceptions, and behavior are major determinants of indi- 
vidual di?erences [2]. Individual di?erences are considered a vital factor in the context 
of technology acceptance. Agarwal and Prasad [3] state that while it is not clearly known 
how strong the e?ect of individual di?erences on technology acceptance is, the impor- 
tance of individual di?erences as a vital construct in technology acceptance is indisput- 
able. Investigating individual di?erences can assist organizations in creating a pro?le 
for individual users within the organization. Therefore, based on the user’s pro?le, tech- 
nology acceptance can be facilitated by introducing various intermediations, which may 
improve individuals’ beliefs about certain technology. 
According to Hong et al. [4], individual di?erences are present in many studies 
concerning information system success [5, 6] and human/computer interactions [7]. 
Previous research has identi?ed di?erent variations of individual beliefs that a?ect 
technology acceptance, such as self-e?cacy [8], computer self-e?cacy [4, 9–11], ease 
of use and usefulness [12] (Reference removed according to blind review policy), and 
experience with educational tools, LMS in particular [13]. This study attempts to provide 
researchers with a guide for investigating the in?uence of self-e?cacy on ease of use 
and usefulness in educational settings. More speci?cally, this study will focus on the 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1142–1150, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_85
in?uence of individual di?erences in self-e?cacy on ease of use and usefulness, and 
potentially behavioral intention to use a certain technology. The outcome of this paper 
will be a model that can be integrated in any study concerning technology acceptance 
with particular focus on the use of technologies to mediate learning and teaching. 
The rest of the paper is structured as follows. First, related literature is explored to 
provide a theoretical background for the proposed role of self-e?cacy in technology 
acceptance. Then, the study investigates the possible interrelationships between factors 
explored in the literature review section. Finally, the proposed model is presented 
followed by the conclusion which discusses future research plans and how the proposed 
model can be validated. 
2 Theoretical Background 
2.1 Perceived Ease of Use and Usefulness 
Perceived ease of use (PEOU) and perceived usefulness (PU) are two internal beliefs 
within the technology acceptance model (TAM) [12]. According to Davis [12], 
perceived usefulness is “the prospective user’s subjective probability that using a 
speci?c application system will increase his or her job performance within an organi- 
zational context” (p. 985) and perceived ease of use is “the degree to which the prospec- 
tive user expects the target system to be free of e?ort” (p. 985). Ease of use and usefulness 
are factors that can be a?ected by external variables and mediate the in?uence of these 
external variables on users’ attitudes towards a certain technology, their behavioral 
intention to use, and the actual technology use. TAM is a well-tested technology accept- 
ance theory and has been adapted in several ?elds of technology and system use (Fig. 1) 
[14–17]. 
Fig. 1. The technology acceptance model [12]. 
2.2 Technological Pedagogical and Content Knowledge 
Teaching is argued to be an ill-structured process in nature and involves complex 
processes. This complexity has attracted a large volume of research to investigate the 
teachers’ thought processes and the di?erent types of knowledge teachers need. Recent 
research identi?ed three primary types of teacher knowledge: content knowledge, 
teaching knowledge, and technological knowledge. These three unitary components of 
knowledge are the core elements of the technological pedagogical and content 
The Role of Self-e?cacy in Technology Acceptance 1143
knowledge framework (TPACK) proposed by Mishra and Koehler [18]. TPACK 
extends the pedagogical content knowledge theory (PCK) introduced by Shulman [19] 
which is de?ned in Shulman [20] as “the special amalgam of content and pedagogy that 
is uniquely the province of teachers, their own special form of professional under- 
standing” (p. 8). Generally, Shulman [19] aimed to build a coherent framework that 
explains the type of knowledge teachers should have and the relationship among content-related 
knowledge and pedagogy knowledge. PCK involves an understanding of general 
pedagogical knowledge that goes beyond subject matters such as classroom organization 
and management. Further, PCK advocates the importance of knowledge related to 
students and their personal traits. Another aspect of knowledge included within PCK is 
the knowledge of educational context, like the understanding of the community cultures, 
educational goals and purposes, content knowledge, curriculum knowledge, and peda- 
gogical content knowledge. Despite this broad range of included knowledge, PCK does 
not envision the importance of ICT in the teaching process because it does not recognize 
the importance of ICT. 
In this digital age, the introduction of technology into classrooms is inevitable. This 
change has motivated educational researchers to understand the e?ects of technology 
on teaching and teachers’ beliefs [21]. This is the main idea behind Mishra and Koehler 
Fig. 2. The TPACK framework (source [22]). 
1144 S. Alharbi and S. Drew
[18] TPACK framework (Fig. 2) which adds technological knowledge to the PCK 
theory. Technological knowledge discusses the knowledge required by teachers to 
e?ectively integrate technology into the teaching process. Therefore, the TPACK frame- 
work combines the complex interplay of pedagogical knowledge, content knowledge, 
and technological knowledge. According to Mishra and Koehler [18]: 
TPACK is the basis of good teaching with technology and requires an understanding of the 
representation of concepts using technologies, pedagogical techniques that use technologies in 
constructive ways to teach content; knowledge of what makes concepts di?cult or easy to learn 
and how technology can help redress some of the problems that students face; knowledge of 
students’ prior knowledge and theories of epistemology; and knowledge of how technologies 
can be used to build on existing knowledge and to develop new epistemologies or strengthen 
old ones (p. 1029). 
Technology Knowledge (TK): Technology knowledge refers to the knowledge 
required by teachers to embrace a new technology for teaching. Teachers raise and 
evolve their level of technological knowledge to get the most out of recent technology 
either for teaching or in daily life. This includes knowledge about basic digital technol- 
ogies, communication technology, information processing and systems, and the ability 
to solve technological issues when they occur. 
Content Knowledge (CK): According to Mishra and Koehler [18], content knowledge 
is the “knowledge about actual subject matter that is to be learned or taught” (p. 1026). 
Di?erent subjects (e.g. earth science, mathematics, language, arts) have di?erent 
content, and therefore teachers must be knowledgeable about their subjects’ content. 
Pedagogical Knowledge (PK): This type of knowledge refers to the strategies that may 
be used for teaching and the understanding of learning and teaching practices and theo- 
ries. For example, classroom management, plan development, and assessment are all 
pedagogical knowledge teachers must have. 
Pedagogical Content Knowledge (PCK): This is knowledge of subject matters, with 
reference to knowledge of teaching methods [19]. The combination of content and 
pedagogy knowledge aims to improve teaching strategies in the content areas. 
Technological Content Knowledge (TCK): Technological content knowledge repre- 
sents the knowledge required by teachers to present subject matter e?ectively, using a 
speci?c technology. This type of knowledge enables changing learning practices, as 
speci?c technology could be used for speci?c content. 
Technological Pedagogical Knowledge (TPK): Technological pedagogical knowl- 
edge implies how technological knowledge can be used to implement various teaching 
methods. Thus, the way teachers teach may change with the introduction of technology 
in classrooms. 
Technological Pedagogical Content Knowledge (TPACK): Technological pedagog- 
ical content knowledge is the knowledge required to e?ectively integrate technology to 
implement di?erent types of teaching methods with di?erent types of subject content. 
TPACK is a complex intersection between three domains of knowledge (content knowl- 
edge, pedagogy knowledge, and technological knowledge), where teachers intuitively 
The Role of Self-e?cacy in Technology Acceptance 1145
understand how to teach speci?c content using suitable teaching strategies and speci?c 
technology. 
2.3 Self-e?cacy 
According to Angeli and Valanides [23], individuals’ beliefs and experiences are signif- 
icant constructs that may mediate individuals’ use of ICT in education. Self-e?cacy 
describes an individual’s ability to do a certain task in a given context. Bandura [24] 
de?nes self-e?cacy as “beliefs in one’s capabilities to organize and execute the courses 
of action required to produce given attainments” (p. 3), which can in?uence various 
aspects of one’s behavior. 
Computer self-e?cacy has been repeatedly highlighted in educational research. In 
various studies, computer self-e?cacy is a concept that is used to describe teachers’ 
perceptions of their level of con?dence to use technology-enhanced learning to facilitate 
teaching and the student learning process. The studies also proposed a relationship 
between teacher anxiety and computer self-e?cacy, in which one’s anxiety is due to 
his/her low level of e?cacy in using ICT in teaching [25]. This anxiety may hinder the 
introduction of technology to enhance the teaching experience and improve student 
knowledge. Teaching with technology is also linked to computer self-e?cacy belief in 
many studies. For instance, based on Bandura’s theory, Wong et al. [26] introduced 
computer teaching e?cacy as a factor that may a?ect technology acceptance in an 
educational setting. Computer teaching e?cacy is de?ned as one’s perception of their 
level of competence and ability to adopt computers in teaching [26]. 
Previous studies suggest that higher self-e?cacy belief may enhance technology 
acceptance, while lower self-e?cacy may a?ect one’s decision to accept new tech- 
nology. In a similar study, Park et al. [27] found that self-e?cacy, among other psycho- 
logical traits, is a signi?cant determinant of technology acceptance, and the higher self-e?cacy 
is, the higher technology acceptance will be. Similarly, Bandura [24] advocated 
that the level of one’s con?dence to perform a task successfully and the outcome expect- 
ation have a direct impact on the motivation to perform that task. 
2.4 The Relationship Between Self-e?cacy and TAM 
Various individual characteristics have been examined in technology acceptance studies. 
For instance, many studies have examined the impact of computer self-efficacy on tech- 
nology acceptance through the effect on perceived ease of use, perceived usefulness, and 
behavioral intention to use a given technology. In line with this present research, studies 
that investigated the effect of external variables, such as individual differences, particu- 
larly computer-self efficacy, on the core constructs of TAM are explored. 
There is a consensus among social scientists that a relationship exists between indi- 
vidual di?erences and perceived ease of use and behavioral intention to use a certain 
technology. A study conducted by Darsono [28] revealed that computer self-e?cacy 
indirectly impacts both perceived ease of use and perceived usefulness. The behavioral 
intention to use also seems to be directly a?ected by individual characteristics such as 
computer self-e?cacy. Similarly, Gong et al. [29] examined di?erent determinants in 
1146 S. Alharbi and S. Drew
relation to technology acceptance in an educational setting. The study showed a strong 
direct impact of self-e?cacy on perceived ease of use, and a weaker relationship between 
self-e?cacy and behavioral intention. Sharp [30] carried out a study using TAM and 
reported that computer self-e?cacy signi?cantly a?ected perceived ease of use, with 
comparable ?ndings in other similar studies [1, 29]. Yi and Hwang [31] investigated the 
application of TAM for a web-based IS and found that self-e?cacy is a strong deter- 
minant of ease of use, and in combination with behavioral intention, signi?cantly a?ect 
the actual use. In summary, it appears that self-e?cacy is a signi?cant determinant of 
perceived ease of use, but not perceived usefulness. 
In contrast, several researchers have challenged the previous studies on the grounds 
that self-efficacy may affect perceived usefulness. In their study, Stylianou and Jackson [2] 
found that self-efficacy influenced perceived usefulness. Teo [32] applied TAM to inves- 
tigate pre-service teachers’ technology acceptance, and reported that the impact of 
computer self-efficacy on perceived usefulness is higher than the impact on perceived ease 
of use. 
It appears that there is inconsistency with this argument. To the authors’ best knowl- 
edge, there is a lack of clarity on the impact of individual characteristics on one’s decision 
to engage in using technology. Furthermore, the previous studies did not clearly show the 
impact of self-efficacy on perceived usefulness, which is a main construct within TAM and 
indirectly affects behavioral intention, and therefore the actual use of the system. 
Bandura’s theory states that there is a signi?cant relationship between self-e?cacy 
and teacher’s knowledge. The theory suggests that improving teachers’ knowledge 
would improve their self-e?cacy belief, which would lead to increased technology use 
as a medium of instruction. As discussed previously, the types of knowledge represented 
in the TPACK domain, and self-e?cacy beliefs are considered signi?cant factors that 
may in?uence teachers’ decisions to incorporate technology to facilitate teaching and 
improve information delivery methods. Many educational studies that incorporate 
TPACK discuss the importance of self-e?cacy beliefs in the involvement of ICT in 
education. Senemoglu [33] as cited in Kazu and Erten [34] states that self-e?cacy is an 
important factor in the development of TPACK. In their study, Yi and Hwang [31] 
investigated self-e?cacy in terms of teacher’s technological pedagogical content knowl- 
edge in relation to web instructions. The study advocates that assessing self-e?cacy is 
essential for providing information on teachers’ education and professional develop- 
ment. Understanding the relationship between self-e?cacy beliefs and the di?erent 
types of knowledge in TPACK could potentially assist in the successful integration of 
technology in teaching. 
3 Proposed Model 
In accordance with the study aims, academics’ technological self-e?cacies are deter- 
mined by their TPACK scores. For the sake of simplicity, only types of knowledge 
related to technology are assessed. Based on the discussion above, the relationships 
between TPACK constructs and TAM are hypothesized as seen in H1–H4 in Table 1. 
The Role of Self-e?cacy in Technology Acceptance 1147
Table 1. Summary of proposed hypotheses 
Hypothesis number Statement 
H1 TK, TCK, TPK and TPACK will signi?cantly in?uence PU 
H2 TK, TCK, TPK and TPACK will signi?cantly in?uence PEOU 
H3 TK will signi?cantly in?uence PEOU positively 
H4 TK will signi?cantly in?uence PU positively 
H5 TCK will signi?cantly in?uence TPACK positively 
H6 TPK will signi?cantly in?uence TPACK positively 
H7 TK will signi?cantly in?uence TPACK positively 
H8 TK will signi?cantly in?uence TPK positively 
H9 TK will signi?cantly in?uence TCK positively 
H10 TCK will signi?cantly in?uence TPK positively 
The hypotheses H1 and H2 investigate the joint e?ect of TPACK on TAM constructs. 
The relationship between academics’ technological knowledge and both PEOU and PU 
is presented in the hypotheses H3 and H4. 
This research also investigates the interrelationships between TPACK constructs as 
seen in H5–H10 in Table 1. The suggested model is depicted in the Fig. 3. 
Fig. 3. The proposed model for self-e?cacy assessment. 
4 Discussion 
The aim of this paper was to explore the in?uence of individual di?erences on tech- 
nology acceptance. It is evident from the literature discussed above that the signi?cance 
of self-e?cacy as a predictor of technology acceptance has received little attention. 
Therefore, in response to this gap this study proposed a model that can be used as a base 
for future research investigating the in?uence of self-e?cacy on technology acceptance. 
The model is derived from well-known theories, namely TAM and TPACK, and is 
suitable for assessing technology acceptance assessment, especially in educational 
1148 S. Alharbi and S. Drew
settings. We recommend the model be validated through quantitative empirical study to 
examine the proposed constructs and test the hypotheses. The scale used for data collec- 
tion will be context-speci?c, therefore the model acts as a guide and measuring items 
are left for future researchers to contextualize. 
5 Concluding Remarks 
This paper is limited to provide a theoretical base for larger studies to investigate the 
role of individual di?erences in technology acceptance. Therefore, we recommend the 
model be validated through quantitative empirical study to examine the proposed 
constructs and test the hypotheses. The scale used for data collection will be context-speci?c, 
therefore the model acts as a guide and measuring items are left for future 
researchers to contextualize. 
References 
1. Lewis, W., Agarwal, R., Sambamurthy, V.: Sources of in?uence on beliefs about information 
technology use: an empirical study of knowledge workers. MIS Q. 27, 657–678 (2003) 
2. Stylianou, A.C., Jackson, P.J.: A comparative examination of individual di?erences and 
beliefs on technology usage: Gauging the role of IT. J. Comput. Inf. Syst. 47(4), 11–18 (2007) 
3. Agarwal, R., Prasad, J.: Are individual di?erences germane to the acceptance of new 
information technologies? Decis. Sci. 30(2), 361–391 (1999) 
4. Hong, W., Thong, J.Y., Wong, W.-M., Tam, K.Y.: Determinants of user acceptance of digital 
libraries: an empirical examination of individual di?erences and system characteristics. J. 
Manag. Inf. Syst. 18(3), 97–124 (2002) 
5. Harrison, A.W., Rainer, R.K.: The in?uence of individual di?erences on skill in end-user 
computing. J. Manag. Inf. Syst. 9(1), 93–112 (1992) 
6. Zmud, R.W.: Individual di?erences and MIS success: a review of the empirical literature. 
Manage. Sci. 25(10), 966–979 (1979) 
7. Dillon, A., Watson, C.: User analysis in HCI—the historical lessons from individual 
di?erences research. Int. J. Hum. Comput. Stud. 45(6), 619–637 (1996) 
8. Igbaria, M., Iivari, J.: The e?ects of self-e?cacy on computer usage. Omega 23(6), 587–605 
(1995) 
9. Chau, P.Y.: In?uence of computer attitude and self-e?cacy on IT usage behavior. J. Organ. 
End User Comput. (JOEUC) 13(1), 26–33 (2001) 
10. Hasan, B.: Delineating the e?ects of general and system-speci?c computer self-e?cacy 
beliefs on IS acceptance. Inf. Manag. 43(5), 565–571 (2006) 
11. Ari?, M.S.M., Yeow, S., Zakuan, N., Jusoh, A., Bahari, A.Z.: The e?ects of computer self-
E?cacy and technology acceptance model on behavioral intention in Internet banking 
systems. Procedia Soc. Behav. Sci. 57, 448–452 (2012) 
12. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information 
technology. MIS Q. 13, 319–340 (1989) 
13. Taylor, S., Todd, P.: Assessing IT usage: the role of prior experience. MIS Q. 19(4), 561– 
570 (1995) 
14. Al-Busaidi, K.A., Al-Shihi, H.: Instructors’ acceptance of learning management systems: a 
theoretical framework. Commun. IBIMA 2010, 1–10 (2010) 
The Role of Self-e?cacy in Technology Acceptance 1149
15. Ma, Q., Liu, L.: The technology acceptance model: a meta-analysis of empirical ?ndings. J. 
Organ. End User Comput. (JOEUC) 16(1), 59–72 (2004) 
16. Kim, D., Chang, H.: Key functional characteristics in designing and operating health 
information websites for user satisfaction: an application of the extended technology 
acceptance model. Int. J. Med. Inform. 76(11), 790–800 (2007) 
17. Moon, J.-W., Kim, Y.-G.: Extending the TAM for a World-Wide-Web context. Inf. Manag. 
38(4), 217–230 (2001) 
18. Mishra, P., Koehler, M.: Technological pedagogical content knowledge: a framework for 
teacher knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006) 
19. Shulman, L.S.: Those who understand: knowledge growth in teaching. Educ. Res. 15(2), 4– 
14 (1986) 
20. Shulman, L.S.: Knowledge and teaching: foundations of the new reform. Harv. Educ. Rev. 
57(1), 1–23 (1987) 
21. Margerum-Leys, J., Marx, R.W.: Teacher knowledge of educational technology: a case study 
of student/mentor teacher pairs. J. Educ. Comput. Res. 26(4), 427–462 (2002) 
22. tpack.org: The TPACK Framework. In: Reproduced by Permission of the Publisher © 2012 
by tpack.org (2012). http://tpack.org 
23. Angeli, C., Valanides, N.: Epistemological and methodological issues for the conceptualization, 
development, and assessment of ICT–TPCK: advances in technological pedagogical content 
knowledge (TPCK). Comput. Educ. 52(1), 154–168 (2009) 
24. Bandura, A.: Self-e?cacy: The Exercise of Control. W.H. Freeman and Company, New York 
(1997) 
25. Brown, I.T.: Individual and technological factors a?ecting perceived ease of use of web-based 
learning technologies in a developing country. Electron. J. Inf. Syst. Dev. Ctries. 9(5), 1–15 
(2002) 
26. Wong, K.-T., Teo, T., Russo, S.: In?uence of gender and computer teaching e?cacy on 
computer acceptance among Malaysian student teachers: an extended technology acceptance 
model. Australas. J. Educ. Technol. 28(7), 1190–1207 (2012) 
27. Park, S., et al.: Acceptance of computer technology: understanding the user and the 
organizational characteristics. In: Proceedings of the Human Factors and Ergonomics Society 
Annual Meeting, vol. 50, pp. 1478–1482. SAGE Publications (2006) 
28. Darsono, L.I.: Examining information technology acceptance by individual professionals. 
Gadjah Mada Int. J. Bus. 7(2), 155–178 (2005) 
29. Gong, M., Xu, Y., Yu, Y.: An enhanced technology acceptance model for web-based learning. 
J. Inf. Syst. Educ. 15(4), 365–373 (2004) 
30. Sharp, J.H.: Development, extension, and application: a review of the technology acceptance 
model. Inf. Syst. Educ. J. 5(9), 1–10 (2006) 
31. Yi, M.Y., Hwang, Y.: Predicting the use of web-based information systems: self-e?cacy, 
enjoyment, learning goal orientation, and the technology acceptance model. Int. J. Hum 
Comput Stud. 59(4), 431–449 (2003) 
32. Teo, T.: Modelling technology acceptance in education: a study of pre-service teachers. 
Comput. Educ. 52(2), 302–312 (2009) 
33. Senemoglu, N.: Gelisim Ögrenme ve Ögretim Kuramdan Uygulamaya (16. Baski). Ankara: 
Pegem Akademi Yay. Egt. Dan. Hiz. Tic. Ltd., Sti (2010) 
34. Kazu, I.Y., Erten, P.: Teachers’ technological pedagogical content knowledge self-e?cacies. 
J. Educ. Train. Stud. 2(2), 126–144 (2014) 
1150 S. Alharbi and S. Drew
An Affective Sensitive Tutoring System 
for Improving Student’s Engagement in CS 
Ruth Agada1 , Jie Yan1(&) , and Weifeng Xu2 
1 
Bowie State University, Bowie, MD 20715, USA 
jyan@bowiestate.edu 
2 
University of Baltimore, Baltimore, MD 21201, USA 
Abstract. With the growing popularity of online teaching and tutoring, there 
are many attempts to enhance students’ learning experience during the lecture. 
This paper presents an animated tutoring system for improving student 
engagement using nonverbal cues, including students’ facial expressions. The 
system can (1) capture students’ facial expressions in the scenario; (2) identify 
various facial expressions, including anger, disgust, fear, sadness, happy, and 
surprise; and (3) provide feedback to students based on students’ facial 
expressions. To evaluate the tutoring system, we predicate the student 
engagement using support vector machine with the captured information, and 
measure students’ engagement using students’ academic performance, i.e., in-system 
exercise, quizzes, and exams. Our empirical study shows that the student 
performance using the level 2 animation is 10% and 20% high then levels 1 and 
0, respectively. 
Keywords: Tutoring systemVirtual character animation 
Conversational agent 
1 Introduction 
In communications, a speaker often uses a composite communication model to better 
interact with the audiences. In other words, the speaker may not know the knowledge 
state of their participants, however, the speaker may adapt his/her behavior to best suit 
the situation by looking at the facial expressions, body gesture and other non-verbal 
cues of these participants. In a virtual environment setting, virtual agents similarly 
express their states using similar modals. In this context, automatic synthesis of hand 
gestures in synchrony with face, as well as speech, is expected to incorporate nonverbal 
communication components into virtual character animation to improve the believ-ability 
of animations. Such approach can be found in a wide range of applications in 
human-centered video gaming and ?lm industries. 
As students refer to mobile and online models to supplement learning, there are 
many attempts to attune different virtual teaching strategies using sensory modality 
available to its human counterpart. Early studies of nonverbal behavior in tutoring 
relied on manual observations of affect and nonverbal behavior [1]. Most studies have 
examined individual modalities in detail, such as facial expression [2–4], posture [5, 6], 
or gesture [6, 7]. As prevailing research focuses on the type of interaction and type of 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1151–1163, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_86
impactful agent behaviors [8–13], the current work builds on this by examining mul-timodal 
data streams, which can provide rich evidence of students’ cognitive and 
affective states, in addition to evidence captured from student logs. It is likely that a 
multimodal combination of automatically tracked affective data streams would need to 
be considered to best adapt to learner affect during tutoring [1, 14]. 
In this paper, we present an affect sensitive framework for an affect capable tutoring 
agent for CS students. The affect sensitive tutor for C++ (AST4CPP) has an embedded 
tutoring agent that can help supplement teachers, assist, and motivate learners in their 
process of distributed learning environments [15]. The study aims to establish more 
effective communication with the students by recognizing the students’ gestures and 
emotional state beyond the heavily studied frustration and boredom emotions and using 
a virtual character capable of expressing its emotional state. In particular, the system 
automatically detects students’ gestures and frontal faces in the video input stream and 
recognizes the emotion with respect to six basic facial expressions (anger, disgust, fear, 
joy, sadness, and surprise) which were suggested by Ekman [16] and further attempts 
to correlate those expressions to affect body gestures. 
2 Related Works 
Several systems incorporate game-based learning environments which are adequately 
researched in [1, 8]. These systems employ different means of engaging the student. 
These means are anything from narrative centered learning to simple didactic learning 
strategies. Results from these studies have shown that they deliver experiences in 
which learning, and engagement are synergistic, also outlined in [8]. Student interac-tion 
data has provided a rich source of information from which students’ development 
of competencies and progress towards learning goals are diagnosed. 
Most notably in the development of affect sensitive tutoring systems is the Auto-
Tutor [17], which features multimodal systems to predict affect based on emotion set 
de?ned by experts. Focusing on the extended emotion set de?ned by [18–20], the 
system produced the best levels of agreement, with Cohen’s K of 0.33 for ?xed 
emotion judgments and 0.39 for spontaneous ones [1]. 
As stated in the previous section, as far as different modalities studied for affect 
information, the face is a large source of affect information [2–4, 21]. Any system in 
development must incorporate facial expression data with any other modality. As far as 
spontaneous data sets are created, stimuli to elicit spontaneous facial actions have been 
highly controlled and camera orientation has been frontal with little or no variation in 
head pose. Rapid head movement also may be dif?cult to automatically track through a 
video sequence. Head motion and orientation to the camera are important if AU 
detection is to be accomplished in social settings where facial expressions often occur 
with head motion [22]. Moreover, the intensity of the expressions plays a big role in 
accurately recognizing the expressions [23, 24]. 
Additionally, body gestures also communicate different affect states. Those motions 
in congress with facial affect add more weight to the full affect state of the student. As a 
result studies performed by [25–27] worked to build databases of body gestures for 
affect recognition. 
1152 R. Agada et al.
To that end, we work to study effects of an affect sensitive system that observes 
features extracted from the face. These features highlight subtle expressions and them 
to known expression sets de?ned by Ekman [16]. 
3 AST4CPP Architecture Framework 
The AST4CPP system employs a combination of a conversational dialog subsystem [9, 
28–30], the observational guidance subsystem [10, 31–33] and auditing subsystem. 
The conversational dialogue subsystem and the observational guidance subsystems 
help students form a deeper cognitive connection to the material. The auditing sub-system 
identify incorrect inputs from students and produce visual and textual expla-nations 
to help the learner identify and correct his/her mistakes. 
3.1 Observational Guidance System 
Being able to “see and hear” the student is key for the system to adequately diagnose 
the students’ engagement/frustration level in an unobstructed manner. Physiological 
measures have been used to measure engagement and alertness. However, these can be 
cumbersome and detract from any learning engagement method employed [34]. To that 
end, we develop a system that works in two phases, ?rstly to develop a system for 
emotion classi?cation that can be applied to the engagement/frustration recognition 
problem. Secondly, aligning affect capabilities to an embodied virtual agent for CS 
students. The ?rst phase of the system is to design an emotion recognition system based 
on speci?c feature sets, which is broken into three different stages: 
• Face registration: The face and facial landmark (eyes, nose, and mouth) positions 
are localized automatically in the image; the face box coordinates are computed; 
and the face patch is cropped from the image [24]. We experimented with window 
size. 
• The cropped face patch is classi?ed by four binary classi?ers for different 
engagement level [34]. 
• The outputs of the binary classi?ers are fed to a decision system to estimate the 
image’s emotion (boredom/frustration). 
Stage (1) is standard for automatic face analysis, and our particular approach is 
described in [24]. Stage (2) is discussed in the next subsection, and stage (3) is 
discussed. 
Boost (BF). For face detection and tracking, the haar-like features have proven an 
extremely reliable means for face detection. The use of haar features is an established 
method for face detection and identi?cation of facial features. In the experiment 
replicated in [34], they use a variant of haar-like features which they refer to as 
Box Filters. The key advantage of working with haar features are their calculation 
speed. Given that the haar wavelet already uses integral images, the features of the 
object in question can be of any size and still be calculated in constant time, but there is 
an issue of over?tting with this method. To resolve the issue of over?tting, the BF is 
usually boosted to improve recognition. In [34], they run a Gentle Boost on BF features 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1153
as seen in Fig. 1, for 100 rounds to extract necessary features to represent the differing 
levels of engagement. 
Gabor Filters. Gabor Filters are bandpass ?lters with a tunable spatial orientation and 
frequency. When a Gabor ?lter is applied to an image, it gives the highest response at 
edges and at points where texture changes as seen in Fig. 2. Gabor Energy Filters have 
a proven record in a wide variety of face processing applications, including face 
recognition and facial expression recognition [35]. 
Gaussian Local Binary Pattern (LBP). Agada uses a method for valence facial 
expression classi?cation. In [24] implementation, the whole image is partitioned into a 
4 
s 
4 sub regions, and the Gaussian local binary pattern operator is applied to each 
region. Furthermore, the local binary operator threshold by the sample mean of 
neighborhood pixels in a 3 
n 
3 sliding window extracts the expressional features of the 
face. Using the sample mean, the output image is a uniformly weighted average, which 
may result in some loss of key features. Jin et al. [36] noted that LBP could miss the 
local structure information under some circumstances. Most implementations of LBP 
Fig. 1. Box ?lter features, sometimes known as haar-like wavelet ?lters. 
Fig. 2. Gabor ?lters applied to an image in the CK+ dataset. 
1154 R. Agada et al.
descriptor ignore statistical relevance of features. Hence, a Gaussian weighted LBP 
operator is employed because the weighted average is more toward the central pixel 
and its frequency response. 
3.2 Conversational Dialogue System 
The conversational dialogue system is a combination of the mapping method in [37] 
and a decision tree based clustering method to determine viseme pattern generations for 
smooth animation sequences. While this method is being applied to the synthesis of 
animation sequences for foreign languages [37], Fig. 3 illustrates some of the visemes 
that have been mapped to one of the AST4CPP virtual agents. 
For smooth animation between each identi?ed phoneme, (1) is applied to each 
phoneme instance piin a leaf node, where d(pi, pj) is the Mahalanobis distance between 
a point pi and pj and a distribution D and N is the number of phonemes in the node. 
Fig. 3. Sample viseme morph targets. 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1155
The smallest value µbest and variance r best are then selected. Equation (2) is then 
used to determine the subset impurity IZ, in which k is a scaling factor [37]. 
li 
¼ 
PN 
j¼1 
d pi; pj 
j j 
N 
j 
1 
ð1Þ 
IZ 
¼ 
N 
Z 
ðlbest 
Z 
k 
Z rbestÞ ð2Þ 
The system generates its behavior pattern from a series of scripts with predeter-mined 
tags. From analyzing data collected from human tutors/teacher during instruc-tion, 
we create a set of gestures to mimic natural gestures during interactions for the 
virtual agent to use as its own behavior set. As with the viseme animation, a similar 
method id applied to the gesture animation to create a smooth transition between each 
triggered animation tags in the sequence. 
3.3 Auditing System 
During immediate feedback training, the system evaluates each student action against 
the current state of the problem state and generate a tree representing all correct variants 
of steps included the best possible action for the student to take. and provides feedback 
on every intermediate step. Correct student actions modify the graph to produce the 
next problem-state for that case and student. Incorrect actions are matched against a set 
of speci?c errors and produce visual and textual explanations to help the learner 
identify and correct their mistakes by sending the appropriate tags to the conversational 
dialog system. The system creates problem space based on the content of the lesson and 
test questions provided by the instructor to the system. In this problem space, the 
system tracks student actions using a simple Dijkstra algorithm to ?nd the appropriate 
path to improve understanding given any possible misconceptions that may occur. As 
stated earlier, not only do the following path trigger certain animation tags for the 
conversational dialog system, but also accounts and attempts to move students toward 
the solution. 
4 Empirical Study 
Our empirical study aims to answering the following two questions: 
• (Research Question One) What is the accuracy of feedback the AST4CPP frame-work 
can achieve in terms of students’ facial expression? Furthermore, can we 
utilize these speci?c features to predicate a student’s engagement or frustration level 
in for a given lecture clips? 
• (Research Question Two) To what extent that students expose to the virtual tutors at 
various levels perform better than students interact with no virtual agents? 
1156 R. Agada et al.
4.1 Accuracy of Detecting Students’ Facial Expressions 
The goal of this empirical study is to test if our framework and algorithms can rec-ognize 
various facial expressions. Speci?cally, (1) we use the same dataset in [38] for 
the accuracy analysis. The dataset contains a sequence of images per subject. We 
exclusively looked at three distinct expressions per subject – the initial neutral pose, the 
midpoint of expression generation, and the apex of the expression. We do this to 
account for varying degrees of the same expression, and (2) we evaluate various facial 
expression recognition accuracy using various feature descriptors listed in the previous 
section. 
Table 1 shows the prototypical facial expression recognition accuracy using the 
support vector machine with respect to the Gabor and Gaussian LBP feature descrip-tors. 
For example, for sadness expression, the recognition rate using Gabor energy 
?lters is 75%, and the accurate rate using Gaussian LBP is 85.93%. On average, for 
Gabor ?lters [39], we see at least a 79.77% accuracy over all expression while the 
Gaussian LBP [24] shows a performance level 10% higher than the Gabor. Overall, 
Gaussian LBP outperforms Gabor Energy Filters. 
Future avenues are to apply the adaptive measures to the LBP operator as it is an 
excellent texture identi?er. 
A student’s engagement or frustration level can be measured by a collection of 
facial expression in a period. Table 2 shows the classi?cation results for cropped faces 
for a ?xed time of one second. Each cell reports the accuracy (2AFC) averaged over 
four cross-validation folds, along with standard deviation in parentheses. Accuracies at 
pixel resolution were very slightly lower. All results are for subject-independent 
classi?cation. 
4.2 How Do Virtual Tutors Influence Students’ Performance? 
To study how virtual tutors have impacts on the students’ performance in programming 
classes, we have cloned three virtual agents. These virtual agents are identical in 
appearance and vocal qualities. Moreover, they all exhibit the same non-instructional 
Table 1. Facial recognition Accuracy Using support vector machine using Gabor and 
Gaussian LBP feature 
Recognition Accuracy 
Gabor energy ?lters (%) [39] Gaussian LBP (%) [24] 
Anger 83 87.16 
Disgust 75.6 87.97 
Fear 79 94.90 
Sadness 75 85.93 
Happy 89 94.29 
Surprise 77 83.08 
Mean recognition accuracy 79.77 88.89 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1157
and let the user know the objective of the lesson. The only difference is that each virtual 
agent behaves at a different animation levels: 
• Level 0: No movement – no emotion (NG-NE): this agent has its head and voice 
expression completely muted; a static version of the agent. 
• Level 1: Idle – no emotion (G-NE): this agent is limited to only audio expression; 
the body and head movements are muted. 
• Level 2: Gesture – emotion (G-E): this agent is fully animated and realistically 
expressive. 
Figure 4 illustrates some example of gestures and expressions of one of the 
AST4CPP virtual agents. 
Test subjects consist of randomly selected students to form three groups from our 
COSC 112 Introduction to Programming courses, which has an enrollment of 59 
students. Each group was assigned to a virtual agent at different performance level. 
Each session lasted approximately forty minutes on multiple days. The students had the 
option to select from any given lecture available curriculum. Recordings of each ses-sion 
included user logs, post-session quizzes and webcam footage. Figure 5 shows the 
AST4CPP interface, in which student’s facial expressions are captured and displayed at 
the right corner of the screen and the virtual tutor is in the main screen. 
Our empirical study is conducted Using the environmental setup in the Compu-tational 
Perception and Animation Lab (CPAL) the participants interfaced with a 
version of the virtual tutor, which was randomly assigned by the experimenter. CPAL 
has an Alienware x51 gaming PC with at least 1 GB of graphics memory, a Logitech 
c270 webcam running at 720p, a three-button mouse, dual color monitors, and 
Sony MDR - XB400 headphones. 
AST4CPP can display a wide range of emotions and gestures but remains in 
domains that have a positive impact on learning. It reads from pre-generated scripts 
with embedded commands for a range of body gestures and affective behavior. The 
agent queries its behavioral database for the desired actions and performs them based 
on user behavior. These actions include: giving positive or negative feedback, dis-cussing 
problems or solutions, giving hints when triggered. While waiting for a 
response from a student, a looping ‘idle’ action seamlessly gives the impression that 
agent is waiting. 
Table 2. Subject-Independent, within-Data Set Engagement Recognition Accuracy (2AFC 
Metric) for Each Engagement [34] 
Engagement/Frustration level Boost (BF) 
1 96.5 
2 70.9 
3 60.7 
4 63.2 
1158 R. Agada et al.
Table 3 shows the student performance for each group. The student performance is 
measured by the mean of all quiz and exam score. Overall, the means of each leave are 
9.72. 8.77, and 7.68, respectively. The student performance using the level 2 clone is 
10% and 20% higher than levels 1 and 0, respectively. We attribute their success to the 
affective nature of the agent, which allows for a more immersive interaction with the 
agent. 
Fig. 4. Various gestures and facial expressions of AST4CPP agent. 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1159
5 Conclusion and Future Work 
Considering that in-class teaching time is very limited, it can be very challenging for an 
educator to instruct the student, while keeping the student’s interest in the course high 
throughout the semester. This requires innovation in teaching, especially when the class 
is large. The affect sensitive tutoring system is designed to simulate the process of one-to-
one tutoring by helping educators to further supplement the contents to suit the 
needs of the students. In this research, we put forth the affective sensitive tutoring 
system for the introductory computer programming course, providing the student the 
opportunity to learn about certain aspects of the course at Bowie State University. 
Several instruction modules were created in which the student is presented with a 
lecture speci?ed by the instructor, as well topics to be covered in that lecture. Based on 
the subtopic selected by the student, only information pertinent to that subtopic is 
displayed. Two issues were evaluated in the analyses: the comprehension level of the 
Fig. 5. AST4CPP interface 
Table 3. Descriptive statistics of user performance for test clones 
Descriptive Score 
LEVEL N Mean Std. 
Deviation 
Std. 
Error 
95% Con?dence 
Interval for Mean 
Minimum Maximum 
Lower 
Bound 
Upper 
Bound 
2 18 9.72 3.97 0.94 7.75 11.7 1 15 
1 22 8.77 4.59 0.98 6.74 10.81 0 14 
0 19 7.68 3.93 0.90 5.79 9.58 0 14 
Total 59 8.71 4.21 0.55 7.62 9.81 0 15 
1160 R. Agada et al.
user after interacting with the system and the user’s affective states during their learning 
experiences. Overall, we see that the results of the condition in which the agent is fully 
expressive shows a marked increase in the level of comprehension because as we 
speculated the user is more invested in the software when the agent fully articulates 
emotion through head movement and facial expression. In addition, the condition in 
which the agent is partially animated illustrates that the comprehension level is the 
average of both the fully animated and non-animated agent. We hypothesized that the 
magnitude of the post-test scores were due to the students’ interaction with the agent. 
Since there were marked differences in the magnitude of the test scores between each 
clone, as predicted the partially expressive and muted clones were not as effective. 
This system’s current iteration mainly focuses on information gathered from the 
face. Future iterations will investigate student body gestures as they pertain to affective 
states of the student, as well as their correlations to facial expression. 
Acknowledgment. This work is supported in part by the National Science Foundation under 
Grant Numbers 1714261. 
References 
1. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E.: Wiebe, E.N., Lester, J.C.: Predicting learning 
and affect from multimodal data streams in task-oriented tutorial dialogue. In: Proceedings 
7th International Conference on Educational Data Mining, Edm, pp. 122–129 (2014) 
2. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Automatically 
recognizing facial indicators of frustration: a learning-centric analysis. In: Proceedings - 
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 
ACII 2013, pp. 159–165 (2013) 
3. Whitehill, J., Serpell, Z., Foster, A., Lin, Y.-C., Pearson, B., Bartlett, M., Movellan, J.: 
Towards an optimal affect-sensitive instructional system of cognitive skills. In: Computer 
Vision and Pattern Recognition Workshop 2011, pp. 20–25 (2011) 
4. Bellegarda, J.R.: A data-driven affective analysis framework toward naturally expressive 
speech synthesis. IEEE Trans. Audio Speech Lang. Process. 19(5), 1113–1122 (2010) 
5. D’Mello, S., Dale, R., Graesser, A.: Disequilibrium in the mind, disharmony in the body. 
Cogn. Emot. 26(2), 362–374 (2012) 
6. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Embodied affect in 
tutorial dialogue: student gesture and posture. In: Lecture Notes in Computer Science, 
including subseries Lecture Notes in Arti?cial Intelligence and Lecture Notes in 
Bioinformatics, vol. 7926 LNAI, pp. 1–10 (2013) 
7. Mahmoud, M., Robinson, P.: Interpreting hand-over-face gestures. In: Lecture Notes in 
Computer Science, including subseries Lecture Notes in Arti?cial Intelligence and Lecture 
Notes in Bioinformatics, vol. 6975 LNCS, no. PART 2, pp. 248–255 (2011) 
8. Min, W., Wiggins, J.B., Pezzullo, L.G., Boyer, K.E., Mott, B.W., Frankosky, M.H., Wiebe, 
E.N., Lester, J.C.: Predicting dialogue acts of virtual learning companion utilizing student 
multimodal interaction data. In: Proceedings 9th International Conference on Educational 
Data Mining, pp. 454–459 (2016) 
9. Baldassarri, S., Cerezo, E.: Maxine: embodied conversational agents for multimodal 
affective communication. In: Mukai, N. (eds.) Computer Graphics. Tech Open-Access 
Publishing (2012) 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1161
10. Roll, I., Aleven, V., McLaren, B.M., Koedinger, K.R.: Improving students’ help-seeking 
skills using metacognitive feedback in an intelligent tutoring system. Learn. Instr. 21(2), 
267–280 (2011) 
11. Baker, R.S.J.D., D’Mello, S.K., Rodrigo, M.M.T., Graesser, A.C.: Better to be frustrated 
than bored: the incidence, persistence, and impact of learners’ cognitive-affective states 
during interactions with three different computer-based learning environments. Int. J. Hum 
Comput Stud. 68(4), 223–241 (2010) 
12. Chi, M., Vanlehn, K., Litman, D., Jordan, P.: An evaluation of pedagogical tutorial tactics 
for a natural language tutoring system: A reinforcement learning approach. Int. J. Artif. 
Intell. Educ. 21(1–2), 83–113 (2011) 
13. D’Mello, S.K., Olney, A., Person, N.K.: Mining collaborative patterns in tutorial dialogues. 
J. Educ. Data Min. 2(1), 1–37 (2010) 
14. D’Mello, S.K., Calvo, R.A.: Signi?cant accomplishments, new challenges, and new 
perspectives. In: Calvo, R.A., D’Mello, S.K. (eds.) New Perspectives on Affect and Learning 
Technologies, pp. 255–271. Springer, New York (2011) 
15. Ben Ammar, M., Neji, M., Alimi, A.M., Gouardères, G.: The affective tutoring system. 
Expert Syst. Appl. 37(4), 3013–3023 (2010) 
16. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of 
Facial Movement. Consulting Psychologists Press, Palo Alto (1978) 
17. Nye, B.D., Graesser, A.C., Hu, X.: AutoTutor and family: a review of 17 years of natural 
language tutoring. Int. J. Artif. Intell. Educ. 24(4), 427–469 (2014) 
18. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I., Ave, F.: The 
extended Cohn-Kanade dataset (CK +): a complete dataset for action unit and emotion-speci?ed 
expression. In: Computer Vision and Pattern Recognition Workshops, pp. 94–101, 
July 2010 
19. Kort, B., Reilly, R.: Analytical models of emotions, learning and relationships: towards an 
affect-sensitive cognitive machine. In: Conference on Virtual Worlds and Simulation 
(VWSim 2002), pp. 1–15 (2002) 
20. Bartneck, C.: Integrating the OCC model of emotions in embodied characters. In: 
Proceedings of the Workshop on Virtual Conversational Characters: Applications, Methods, 
and Research Challenges, Melbourne (2002) 
21. Butko, N.J., Theocharous, G., Philipose, M., Movellan, J.R.: Automated facial affect 
analysis for one-on-one tutoring applications. Face Gesture 2011(2), 382–387 (2011) 
22. Girard, J.M., Cohn, J.F., Jeni, L.A., Sayette, M.A., De la Torre, F.: Spontaneous facial 
expression in unscripted social interactions can be measured automatically. Behav. Res. 
Methods, 1–32 (2014) 
23. Yang, P.: Facial Expression Recognition and Expression Intensity Estimation. The State 
University of New Jersey, Rutgers (2011) 
24. Agada, R., Yan, J.: A model of local binary pattern feature descriptor for valence facial 
expression classi?cation. In: 2015 IEEE 14th International Conference on Machine Learning 
and Applications, vol. 2, no. 2, pp. 634–639 (2015) 
25. Malatesta, L., Asteriadis, S., Caridakis, G., Vasalou, A., Karpouzis, K.: Associating gesture 
expressivity with affective representations. Eng. Appl. Artif. Intell. 51, 124–135 (2016) 
26. Gunes, H., Piccardi, M.: Bimodal face and body gesture database for automatic analysis of 
human nonverbal affective behavior. In: Proceedings - International Conference on Pattern 
Recognition, vol. 1, pp. 1148–1153 (2006) 
27. Weerasinghe, P., Rajapakse, R.P.C.J., Marasinghe, A.: An empirical analysis on emotional 
body gesture for affective virtual communication, vol. 12, no. 1, pp. 101–107 (2015) 
28. Ammar, M.B., Neji, M.: Conversational embodied peer agents in affective e-learning. In: 8th 
International Conference on ITS, pp. 29–37 (2006) 
1162 R. Agada et al.
29. Graesser, A.C., Vanlehn, K., Rosé, C.P., Jordan, P.W., Harter, D.: Intelligent tutoring 
systems with conversational dialogue. AI Mag. 22(4), 39–52 (2001) 
30. Latham, A., Crockett, K., McLean, D., Edmonds, B.: A conversational intelligent tutoring 
system to automatically predict learning styles. Comput. Educ. 59(1), 95–109 (2012) 
31. Arroyo, I., Beck, J.E., Beal, C.R., Wing, R., Woolf, B.P.: Analyzing students’ response to 
help provision in an elementary mathematics intelligent tutoring system. In: Papers of the 
AIED-2001 Workshop on Help Provision and Help Seeking in Interactive Learning 
Environments, pp. 34–46 (2001) 
32. Shah, F.: Recognizing and responding to student plans in an intelligent tutoring system: 
CIRCSIM-tutor. Illinois Institute of Technology (2000) 
33. Shih, B., Koedinger, K., Scheines, R.: A response time model for bottom-out hints as 
worked examples. In: Proceedings of the 1st International Conference on Educational Data 
Mining, pp. 117–126 (2008) 
34. Whitehill, J., Serpell, Z., Lin, Y.C., Foster, A., Movellan, J.R.: The faces of engagement: 
automatic recognition of student engagement from facial expressions. IEEE Trans. Affect. 
Comput. 5(1), 86–98 (2014) 
35. Cament, L.A., Galdames, F.J., Bowyer, K.W., Perez, C.A.: Face recognition under pose 
variation with local gabor features enhanced by active shape and statistical models. Pattern 
Recognit. 48(11), 3371–3384 (2015) 
36. Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved LBP under bayesian 
framework. In: Third International Conference Image Graph, no. 2, pp. 306–309 (2004) 
37. Whipple, J., Agada, R., Yan, J.: Foreign language visemes for use in lip-synching with 
computer-generated audio. J. Comput. Sci. Inf. Technol. 5(2), 1–14 (2017) 
38. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I., Ave, F.: The 
extended cohn-kanade dataset (CK +): a complete dataset for action unit and emotion-speci?ed 
expression, July 2010 
39. Ou, J., Bai, X.-B., Pei, Y., Ma, L., Liu, W.: Automatic facial expression recognition using 
gabor ?lter and expression analysis. In: Second International Conference on Computer 
Modeling and Simulation 2010, ICCMS 2010, vol. 2, pp. 215–218 (2010) 
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1163
Multimedia Interactive Boards as a Teaching 
and Learning Tool in Environmental Education: 
A Case-Study with Portuguese Students 
Cecília M. Antão(?) 
Centro de Química Estrutural, Instituto Superior Técnico, Universidade de Lisboa, 
Lisbon, Portugal 
cmantao@gmail.com 
Abstract. Multimedia interactive whiteboard (IWB) is a teaching and learning tool 
which, in spite of recent application, has yielded good results in the instruction and 
learning process with students of elementary schools, in different countries. In this 
study, a multimedia interactive whiteboard was used as an e-learning tool in envi- 
ronmental education under a STSE perspective. Two classes of high-school 
students, aged 13–16 years old, participated in the study, aiming to improve envi- 
ronmental awareness and Natural Sciences scores in an urban public school in 
northern Portugal. The goal was to understand how efficient the multimedia inter- 
active whiteboard and the ‘Society, Technology, Science-Environment’ (STSE) 
perspective were in understanding the concepts related to habitat destruction and how 
they would both contribute to the students’ behavior when taking part in a classroom 
debate to find the best solutions for an environmental problem. A questionnaire 
(attitude scale) and a post-test were the research tools used in the educational meth- 
odology. There were various benefits of using IWB in class – promotion of envi- 
ronmental awareness and collaborative work, increased test achievement and 
increased willingness to study Natural Sciences. 
Keywords: Interactive whiteboard · Multimedia · Natural sciences 
Society, Technology, Science-Environment (STSE) 
1 Introduction 
Over the last decade, novel interactive technologies have become popular in classrooms 
across the world. According to the 2016 ?nal report of the European Commission DG 
Communications Networks, Content & Technology [1], a connectivity package of 
measures was taken to ensure that everyone in the EU would have the best possible 
internet connection to participate in the digital society and economy, based on the wide- 
spread deployment and take-up of very high capacity networks, in rural and urban areas. 
Beforehand, in 2013, it was stated that interactive whiteboards (IWB) were spread in all 
European schools at di?erent levels [2]. Several studies have been conducted focusing 
on the advantages, problems and impact of IWB in public education. The current case-study 
aims at evaluating the bene?ts of IWB, under an STSE approach, in environmental 
awareness and in Natural Sciences knowledge, with 8th level - students in a public high 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1164–1169, 2019. 
https://doi.org/10.1007/978-3-030-02686-8_87
school, in Portugal. For this purpose, the IWB technology was used to present a real 
environmental problem in their geographical region, the students being asked to assume 
a speci?c character in a role-play activity, discuss the problem within the classroom ad, 
eventually, reach a solution. 
1.1 Portuguese Paradigm 
Portugal is a peculiar case in education and its evolution during the 20th century was 
very di?erent from all other European countries, with higher rates of school drop-out 
and failure. As a consequence, the scienti?c literacy of the Portuguese population was 
quite poor comparing to other European countries early this century. The educational 
situation has been gradually improving since then, along with the introduction of ICT 
in junior and high schools and the increasing social interest for the web technologies. 
In 2008, there was a governmental plan to install IWB in public schools, in two-phases, 
1600 units and then 6000 units, aiming at one IWB per 3 rooms [3]. A national 
plan to train teachers in ICT competences was carried out at the same time and, by 2010, 
31230 teachers had received speci?c ICT training with interactive whiteboards. 
In 2012, the PISA report showed that Portugal was already above the equity line of 
OECD countries as to the allocation of educational resources and mathematics perform- 
ance [4]. The 2015 PISA assessment focused on science literacy and the Portuguese 
students showed to be above OECD average in sciences for the ?rst time in 15 years [5]. 
IWB-based education is however far from being popular in science teaching-and-learning. 
This study attempts to show the bene?ts of IWB as an e-learning tool for 
Natural Sciences to 8th level students, in a Portuguese public school. 
2 Methodology 
In Portugal, basic teaching lasts 9 years divided by 3 cycles, the 3rd cycle comprising 
7th to 9th levels. The Natural Sciences curriculum for the 8th level is usually taught in 
two 60 min-lessons per week. 
This research took place in an urban public high-school, in Porto, Portugal. It was 
conducted during 4 lessons in 2–3 weeks: the students interacted directly with the IWB 
in the ?rst three lessons and in the last one they were asked to complete a questionnaire 
and evaluated in a post-test. More precisely, the IWB technology was ?rst introduced 
to the students who were encouraged to play an online game with their class’ support. 
Second, the IWB was used to introduce the environmental problem to the class – water 
and soil contamination resulting from industrial waste kept in a deactivated coal mine 
– followed by the explanation of the role-play activity in order to let the students think 
about the di?erent characters they would assume in the classroom debate - the President 
of the community, the engineer, the farmer, the scientist, the teacher, the ?reman, the 
priest, the doctor, and the retired man/woman. In the third lesson, after forming small 
groups to prepare the debate, they discussed the problem in the classroom as in a parish 
council until they reached a consensual solution [6]. 
Multimedia Interactive Boards as a Teaching and Learning Tool 1165
A questionnaire survey with 14 questions was delivered to identify individual 
response to IWB-based lessons, adapted from [7]. A post-test assessed the students’ 
knowledge and ability to deal with an environmental problem similar to the one 
discussed in class, as de?ned in the Natural Sciences curriculum. 
The software used was ActivInspire Studio; the resources were ActivBoard, 
ActivPen and ?ipcharts. 
3 Participants 
The sample of the research included 41 students from an urban context in the city of 
Porto, divided by class A (20 students) and class B (21 students) of 8th grade, aged 13– 
16 years. It comprised 46% girls and 54% boys. All of them participated in the survey 
and 98% enrolled in the post-test. 
4 Findings 
4.1 The Results of the Analysis of the Respondents’ Attitude Towards the Bene?ts 
of Using IWB 
Questionnaire/Attitude Scale. The response of each student followed an attitude scale 
with ?ve levels – strongly agree, agree, partly agree, disagree and strongly disagree 
(Table 1). 
Table 1. The opinion of the students about the bene?ts of IWB learning 
Question Strongly 
agree (%) 
Class A– 
Class B 
Agree (%) 
A–B 
Partly 
agree (%) 
A–B 
Disagree 
(%) A–B 
Strongly 
disagree 
(%) A–B 
I concentrate better when the 
IWB is used 
50–33 40–57 10–10 0–0 0–0 
Since IWB is used, I am eager to 
come to school 
10–10 40–19 15–29 10–24 25–19 
There is no need to use IWB in 
the classroom 
0–0 0–0 5–10 35–45 60–48 
To learn, IWB is no di?erent 
from other boards 
5–5 10–10 20–24 20–43 45–19 
I enjoy learning with the IWB 75–43 15–43 10–10 0–5 0–0 
The students of Class A showed a more positive opinion about the IWB than the 
class B: higher percentages of Class A students declared to ‘strongly agree’ to the 
sentence ‘I enjoy learning with the IWB’ and to ‘agree’ that ‘since the IWB is used, I 
am eager to come to school’. 
When the results were aggregated by gender, a signi?cant percentage of boys (38%) 
strongly agreed to study more because of the IWB than the girls (20%). On the other 
1166 C. M. Antão
hand, girls showed a slightly more positive attitude – 65% said they ‘agree’ to put their 
?nger in the air more frequently with IWB-based lessons than 62% of the boys (data not 
shown). 
STSE Perspective. The teaching strategy of problem-solving associated with role-play 
gave better results in lessons/groups where most students were of feminine gender. In 
these cases, we reached a solution for the environmental issue. During the debate, the 
girls showed to be more mature when standing up for their ideas whilst some boys tended 
to make out-of-context comments in order to get laughs. On the other hand, certain 
students engaged themselves in the role-play and were surprisingly able to understand 
all aspects of the problem and look for a solution. Just like [8], we believe that the STSE 
approach associated with problem-solving allowed the development of competences, in 
particular, the search for information to better understand some phenomena, science 
communication and cooperation with the others. 
Post-test. In the fourth lesson, the students’ knowledge was evaluated with a post-test 
presenting an environmental problem similar to the role-play – the environmental 
recovery of a deactivated land mill, in the suburbia of Porto. 
Class A students obtained an average score of 63.4% and class B students, 64.4%. 
Most students understood the concepts towards habitat destruction and problem-solving 
with environmental and social implications. 
Negative scores, meaning below 50%, were only 14.3%, which was much better than 
the usual test results for both classes in Natural Sciences. 
Again, the analysis by gender showed signi?cant di?erences: the boys obtained an 
average score of 68% while the girls had 59.5%, i.e. an 8.5% di?erence, considered to 
be signi?cant in a small sample like this one (Fig. 1). 
Fig. 1. Dispersion of the scores, in percentage, obtained in the post-test by the students of class 
A and B, according to the gender. The standard deviation was 24.8% for the girls and 14.3% for 
the boys. 
Multimedia Interactive Boards as a Teaching and Learning Tool 1167
5 Discussion 
Concerning attitudinal behavior of 8th level students towards the IWB, no great di?er- 
ences were detected between Class A and Class B. Most students had a positive response 
to IWB technology and engaged in the problem-solving strategy. 
Nevertheless, the majority of Class A students showed more enthusiasm in certain 
aspects of IWB-based technology – Class A was the one whose students enjoyed learning 
with IWB-based tools the most, even though the post-test results showed little di?er- 
ences from Class B as to knowledge acquisition and environmental awareness. 
The most signi?cant di?erences refer to the gender: not only the boys showed a more 
positive attitude to IWB but also they got higher scores in the post-test. It is likely that 
the boys from both classes were more con?dent about their computer knowledge than 
the girls and, therefore, the IWB was more attractive to them. This may have led the 
boys to study more than the girls, and to get higher scores than the girls in the post-test. 
In spite of the economic crisis, the majority of these students owned a PC or a smart- 
phone, this same aspect being mentioned in a case-study with IWB in the ?rst cycle also 
conducted in Portugal, in 2015 [9]. Therefore, both boys and girls of our research groups 
had similar opportunities to interact with ICT at home and the di?erences cannot be 
justi?ed by a di?erentiated access to ICT tools. 
This gender difference could be explained by the role-play used in the problem-solving 
methodology: the boys chose the characters with decision responsibilities 
(president or engineer) while the girls preferred more neutral characters like a scien- 
tist or a teacher. Being on a decision position, the boys may have engaged in the 
environmental problem more seriously and, as a consequence, spent more time 
thinking about a solution. 
The use of IWB was undoubtedly bene?cial as a teaching-and-learning tool in this 
study: most students showed to be in favor of using IWB in the classroom and their 
motivation may have contributed to their score increase in the post-test. It was also a 
pro?table experience for the training teacher to be a debate mediator and ?nd the right 
moment to leave the spotlight, in order to empower the students in searching for the 
environmental problem solution. 
The role-play associated with the problem-solving methodology developed the 
collaborative and cognitive skills of the students: individual or collaborative assessments 
such as sharing and negotiating ideas, regulating problem solving, maintaining commu- 
nication, knowledge and problem-solving skills were part of the process of learning. 
These proved to be key assessments in a recent study to develop a Collaborative Science 
Assessment prototype by ETS researchers in the USA [10]. 
Overall, the IWB technology was successful in associating a role-play approach of 
an environmental problem-solving. The way the IWB was used increased the students’ 
motivation for learning, also because it focused on something real – a local environ- 
mental problem – that is rarely mentioned in science educational research. The IWB 
contributed to increase the 8th level students’ interest and knowledge of Natural Sciences 
and made them aware of some environmental issues. Similarly, in a research study in 
Hungary in 2017, a speci?c ICT tool, Edmodo, was found to enhance 10th level students’ 
academic achievement and motivation in Biology learning [11]. 
1168 C. M. Antão
In future studies more classes of the same size should be used in order to get extended 
statistical data. The period of research would also bene?t from an extra-lesson to help 
the students build their characters and improve their performance in the debate. ICT 
technology, and in particular IWB, has already proved to be an advantage in science 
teaching-and-learning in public schools but it still lacks the necessary engagement of 
teachers to try new interdisciplinary approaches. 
References 
1. European Commission DG Communications Networks: Content & Technology 2016. https:// 
ec.europa.eu/info/publications/annual-activity-report-2016-communications-networks-content-
and-technology_en. Accessed 25 Apr 2018 
2. Vainoryte, B., Zygaitiene, B.: Peculiarities of interactive whiteboard application during 
lessons in Lithuanian general education schools. Procedia–Soc. Behav. Sci. 197, 1672–1678 
(2015) 
3. Min-Edu.pt Homepage. http://erte.dgidc.min-edu.pt/publico/conteudos/BrochuraQIM.pdf. 
Accessed 25 Apr 2018 
4. PISA 2012: Programme for International Student Assessment. http://www.oecd.org/pisa/ 
key?ndings/pisa-2012-results.htm. Accessed 25 Apr 2018 
5. PISA 2015: Programme for International Student Assessment. http://www.compareyour 
country.org/pisa/country/prt?lg=en. Accessed 25 Apr 2018 
6. Krüger, V., Nunes, S.L.P.: Um projeto educativo referenciado pelo MIE e um enfoque CTS. 
In: IV Encontro Ibero-Americano de colectivos escolares e redes de professores que fazem 
investigação na sua escola, Lajeado, RS, Brazil, pp. 1–8 (2005) 
7. Sad, S.N.: An attitude scale for smart board use in education: validity and reliability studies. 
Comput. Educ. 58, 900–907 (2012) 
8. Carrasquinho, S., Vasconcelos, C., Costa, N.: Resolución de problemas en la enseñanza de 
la geologia: contribuciones de un estudio exploratório. Revista Eureka sobre Enseñanza y 
Divulgación de las Ciencias 4(1), 67–86 (2007) 
9. Dias, V., Gil, H., Costa, N., Gonçalves, T.: O quadro interativo multimédia (QIM) num 
contexto de prática de ensino supervisionada em 1º CEB. In: Atas do XVII Simpósio 
Internacional de Informática Educativa, Escola Superior de Educação do Instituto Politécnico 
de Setúbal, pp. 7–12, Setúbal, Portugal, 25–27 novembro 2015 
10. von Davier, A.A., Hao, J., Lei, L., Kyllonen, P.: Interdisciplinary research agenda in support 
of assessment of collaborative problem solving: lessons learned from developing a 
collaborative science assessment prototype. Comput. Hum. Behav. 76, 631–640 (2017) 
11. Végh, V., Nagy, Z.B., Zsigmond, C., Elbert, G.: The e?ects of using Edmodo in biology 
education on students’ attitudes towards biology and ICT. Probl. Educ. 21st Century 75(5), 
483–495 (2017). http://oaji.net/articles/2017/457-1509895649.pdf. Accessed 01 June 2018 
Multimedia Interactive Boards as a Teaching and Learning Tool 1169
Author Index 
A 
Abdel-Salam, T. S., 313 
Abdou, George, 616 
Abuhussein, Abdullah, 205 
Afshari, Hamed H., 298 
Agada, Ruth, 1151 
Agbo, J. J., 1109 
Ahmed, Bestoun S., 241 
Ait Kadi, Daoud, 904 
Al Shebli, Hessa Mohammed Zaher, 196 
Alférez, Germán H., 257 
Alharbi, Saleh, 1142 
Al-Jarrah, Ahmad A., 1017 
Al-Maitah, Mohammed, 359 
Al-Maliki, Murtadha, 828 
Alsubaei, Faisal, 205 
Amiruzzaman, Md, 283 
Anderl, Reiner, 458 
Antão, Cecília M., 1164 
Arévalo, Andrés, 444 
Arosha Senanayake, S. M. N., 598 
Atalla, Nadi, 616 
Aung, Swe Swe, 530 
Avdoshin, Sergey, 626 
Aylon, Linnyer Beatryz Ruiz, 1 
Aziz, N., 835 
B 
Bakken, David, 517 
Balar, Kalpesh, 678 
Bansal, Arvind K., 569 
Barber, K. Suzanne, 369 
Barreto, Gilmar, 1 
Batista, Vivian F. López, 874 
Behar, Patricia Alejandra, 982 
Beheshti, Babak D., 196 
Benítez, Diego S., 171 
BenMessaoud, Fawzi, 235 
Berger, Philipp, 962 
Birnbaum, Dror, 884 
Biswas, Subir, 79 
Blumrosen, Gaddi, 884 
Bures, Miroslav, 241 
Busch, John, 1005 
C 
Cachipuendo, Rolando, 171 
Cao, Houwei, 55 
Cao, Xixi, 55 
Caraguay, Jorge A., 874 
Carpenter, Vanessa Julia, 104 
Chan, Liliang, 55 
Chavan, Satishkumar S., 548 
Cheng, Lian-Ta, 640 
Cheng, Qijin, 385 
Cherif, Arab Ali, 481 
Chiang, Chen-Fu, 914 
Chieng, David, 598 
Chowdhury, Wahida, 152 
Claeser, Daniel, 418 
Coady, Yvonne, 1062 
Cooper, Rachel, 134 
Coulton, Paul, 134 
D 
Darman, Rozanawati, 763 
Dasigi, Venu G., 505 
Deng, Zhiqun Daniel, 517 
Di Biano, Robert, 659 
Djerroud, Halim, 481 
© Springer Nature Switzerland AG 2019 
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1171–1174, 2019. 
https://doi.org/10.1007/978-3-030-02686-8
Drew, Steve, 1142 
Du, Jiali, 806 
Du, Xu, 1097 
Du, Youchen, 46 
Duron-Arellano, David, 19 
E 
Elaish, Monther M., 1029 
Encalada, Patricio, 63 
F 
Feller, Nico, 856 
Feng, Lin, 46 
Fernando, Terrence, 333 
Foroughi, Farhad, 185 
Fu, Tao, 517 
Fuertes, Walter, 171 
G 
Gao, Jiahui, 385 
Garcia, Jordi, 687 
Garg, Manu, 930 
Gavilanes-Sagnay, Fredy, 1123 
Gawde, Purva R., 569 
Ghani, Norjihan Abdul, 1029 
Ghods, Amir H., 298 
Giorno, Fernando, 159 
Golpayegani, S. Alireza Hashemi, 343 
Gómez-Cárdenas, Alejandro, 122 
Gordón, Carlos, 63 
Greer, Des, 1005 
Grif?th, Henry, 79 
Guanochanga, Byron, 171 
Gupta, Neeraj, 730 
Gust, Peter, 856 
H 
Haase, Ines, 856 
Haddara, Moutaz, 92 
Hamdi, Mohamed Salah, 490 
Hameed, Sarab M., 776 
Hanna, Philip, 1005 
Hassan, H. F., 1109 
Hausknecht, Simone, 1133 
Hazeem, Ahmed Abdulbasit, 763 
Helgesen, Tim, 92 
Heljakka, Katriina, 1079 
Hennig, Patrick, 962 
Hernandez, German, 444 
Hill, Richard, 333 
Hormizi, Elham, 343 
Hou, Hongfei, 517 
Hou, Yu, 649 
Hübner, Rodrigo, 1 
Hung, Jui-Long, 1097 
I 
Idowu, A., 1109 
Ihamäki, Pirita, 1079 
Iram, Shamaila, 333 
Islam, Md. Manirul, 225 
Iyengar, S. S., 659 
J 
Jacoby, Derek, 1062 
Jalali, Shahrzad, 298 
Jaramillo, Edgar D., 874 
Jayaweera, C. D., 835 
Joshi, Karishma, 589 
K 
Kahvazadeh, Sarang, 122 
Kampa, Sebastian P., 856 
Karam, Orlando, 505 
Kaufman, David, 1133 
Keivanpour, Samira, 904 
Kent, Samantha, 418 
Khairiyah Binti Haji Raub, Siti Asmah @, 598 
Khaleefahand, Shihab Hamad, 763 
Khaleq, Abeer Abdel, 401 
Khosravy, Mahdi, 730 
Komogortsev, Oleg, 79 
Kong, Fenddy Kong Mohd Aliff, 849 
Krishna, Praful, 678 
Kruza, Oldrich, 749 
Kubon, Vladislav, 749 
Kulkarni, Harshad, 678 
Kulkarni, Rucha, 678 
L 
Lavi, Yaron, 884 
Lema, Henry, 63 
León, Diego, 63, 444 
Liau, David, 369 
Lindley, Joseph, 134 
Liu, Shenglan, 46 
Liu, Xiangyu, 55 
Loi, Daria, 788 
Loza-Aguirre, Edison, 1123 
Lu, Jun, 517 
Luksch, Peter, 185 
M 
Machado, Leticia Rocha, 982 
Madni, Asad M., 659 
Mahamud, Md. Sadad, 225 
Marín-Tordera, Eva, 122 
1172 Author Index
Martinez, Jayson J., 517 
Masip-Bruin, Xavi, 122, 687 
McGowan, Aidan, 1005 
McKenna, H. Patricia, 269 
Medendorp, Anthony, 941 
Mehrandezh, Mehran, 19 
Meinel, Christoph, 962 
Meneses, Fausto, 171 
Miller, John, 517 
Møbius, Nikolaj “Dzl”, 104 
Mohamed, Abduljalil, 490 
Moravcik, Oliver, 950 
Mostafa, M. A. M., 313 
Mostafa, Salama A., 763 
Muhammad, Iqra, 435 
Mukhopadhyay, Supratik, 659 
Mustapha, Aida, 763 
N 
Nagayama, Itaru, 530 
Naim, Abdul Ghani, 598 
Nat, M., 1109 
Navarro Ortega, Samuel A., 992 
Nielson, Jeffery A., 569 
Niño, Jaime, 444 
Noureen, Rabia, 435 
O 
Ohsawa, Shin, 530 
Okunlola, L., 1109 
Oluwajana, Dokun, 1109 
Osman, Mohd Azam, 849 
Overholt, Dan, 104 
P 
Patel, Nilesh, 730 
Pawar, Abhijit, 548 
Peluffo-Ordóñez, D. H., 874 
Peñaherrera, Cristian, 63 
Peng, Pai, 55 
Pesotskaya, Elena, 626 
Petrykowski, Markus, 962 
Pijal-Rojas, José, 874 
Pilar Munuera Gómez, M., 992 
Pitz, Katrin, 458 
Preston, Nicholas, 1062 
Pydimarri, Sailaja, 505 
Pyeatt, Larry D., 705 
Q 
Qamar, Usman, 435 
Qiao, Hong, 46 
R 
Ra, Ilkyeun, 401 
Raahemi, Bijan, 298 
Rahman, Md. Saniat, 225 
Ralph, Rachel, 1062 
Ramudhin, Amar, 904 
Rana, Hukum Singh, 589 
Randrianasolo, Arisoa S., 705 
Riofrío-Luzcando, Diego, 1123 
Rivas, Mario, 159 
Rosenberg, Louis, 721 
Rosero, Edwin A., 874 
Rosero-Montalvo, Paul D., 874 
Ryan, Sarah, 235 
S 
Salman, Osama A., 776 
Salvador, Santiago, 171 
Sandoval, Javier, 444 
Saxena, Nidhi, 589 
Schell, Robyn, 1133 
Schnörr, Claudius, 30 
Segura-Morales, Marco, 1123 
Semwal, Sudhanshu Kumar, 930, 941 
Sengupta, Souvik, 687 
Serebryanyk, Alla, 30 
Sewell III, Thomas, 235 
Shabaty, Or, 884 
Sharma, Rochan, 589 
Shelton, Brett E., 1097 
Shennat, Abdulmonem I., 1029 
Shiva, Sajjan, 205 
Shuib, Liyana, 1029 
Silva, Diogo, 1133 
Soltani, Neda, 343 
Sonego, Anna Helena Silveira, 982 
Soto-Lopez, Daniel, 19 
Stein, Max Vom, 856 
Suman, Samiul Haque, 225 
Svetsky, Stefan, 950 
Swief, R. A., 313 
Sylnice, Joe R., 257 
T 
Tahar, So?ene, 490 
Tahir, Rabail, 1041 
Talbar, Sanjay N., 548 
Talib, Abdullah Zawawi, 849 
Tamaki, Shiro, 530 
Tan, Li, 517 
Tao, Lixin, 649 
Tapia, Freddy, 171 
Author Index 1173
Tiwari, Bhupendra Nath, 730 
Toma, Milan, 557 
Torres, Jenny, 171 
Torrezzan, Cristina Alba Wildt, 982 
Toulkeridis, Theo?los, 171 
V 
Vanduhe, V. Z., 1109 
Villacís, César, 171 
Virzi, Valerio, 856 
W 
Wang, Alf Inge, 1041 
Wang, Wen-Fong, 640 
Willcox, Gregg, 721 
Willis, Amanda, 104 
Wu, Jie, 46 
X 
Xu, Li, 46 
Xu, Weifeng, 1151 
Y 
Yan, Jie, 1151 
Yang, Ching-Yu, 640 
Yang, Juan, 1097 
Yu, Philip L. H., 385 
Yu, Pingfang, 806 
Z 
Zaeem, Razieh Nokhbeh, 369 
Zainon, Wan Mohd Nazmee Wan, 849 
Zaki, Hatem, 313 
Zhang, Mingyan, 1097 
Zisler, Matthias, 30 
Zong, Chengqing, 806 
1174 Author Index