Using Platform Adventure Mechanics for Gamification Research

 Jun 7, 2016   Digital Essays, iDMAa Conference Journal 2016   No Comments

Digital Essay by Joseph Fanfarelli and Rudy McDaniel, University of Central Florida

This article is an extension of the work presented at the iDMAa conference 2015 at East Tennessee State University.




Introduction and Background

Video games are popular objects of critique within the field of digital media. As analyzed cultural artifacts, they demand careful methods of investigation due to their complex compositions of interactivity, narrative, art, and sound. However, while games are complex and interactive, as René Alberto García Cepada notes, they are not boundless; emergent gameplay still depends upon the possibilities envisioned by designers and developers (para 3). The artistic and cultural dimensions, such as those Cepada discusses in his treatment of art history within video games, are important to consider from the context of interactive boundaries and designer affordances. Boundaries might include the beginnings and ends of particular game levels, in an operational sense, but also the specific parameters imposed by research surveys (such as a focus on better understanding player motivation, for example). Better understanding different types of game boundaries is useful for honing in on particular player behaviors using empirical methods. For example, understanding a particular action taken by a player at a particular time, and recognizing the impact of that action within the context of a larger system, is an observation framed by precise boundaries. This is precisely the type of strategy used in gamification research, the focus of this essay.

Figure 1: Early Gamification Strategies (Image by Benjamin Chun -

Figure 1: Early Gamification Strategies (Image by Benjamin Chun,

Gamification is defined in multiple ways, dependent upon context and purpose, but a basic definition is “the use of game mechanics to make learning and instruction more fun” (Kapp xxi). Such research focuses on targeted game mechanics (e.g., points, badges, leaderboards) for specific purposes both inside and outside of games (Landwehr 64; see also Figure 1). From enhancing learning and training to encouraging positive user behavior, well-designed gamification can enhance a broad assortment of non-game activities. However, effective design is not a guarantee; games are complex systems and include an assortment of features, strategies, and functionalities, many of which were found to be more complex than originally anticipated. For example, Karl Kapp’s research and Brian Landwehr’s work tells us that simply adding points or achievements to a system is insufficient (Kapp 220; Landwehr 65). As Brian Landwehr notes, “Simply adding badges, points, and leaderboards does not ensure success. Gamification also carries the concern of overleveraging extrinsic rewards that have weaker long term benefit…” (65). Gamified mechanics must be designed and implemented strategically in order to achieve the intended purpose; when gamification elements are designed effectively for the correct users and the correct context of use, they frequently experience success (Hamari, Koivisto, and Sarsa).


1 Establishing an Empirical Games Testbed

Empirical research is important for evaluating the complexities of these game-based strategies for effectiveness. As programmed artifacts, games rely on numbers and calculations; such quantitative data is also useful for honing in on patterns of player behavior. Such processes are not unfamiliar to the world of game design; for example, Eric Preisz and Ben Garney describe the “optimizer’s mantra” as “benchmark, measure, detect, solve, check, and repeat” (1-2) a process analogous to the scientific method used in experimental research.

While gamification methods are best evaluated when implemented in the specific domain or software in which they are used, this is a luxury that is not afforded to all researchers. For instance, appropriately-sized participant samples can be difficult to obtain when software is highly specialized. The software’s original developers may be unwilling to commit the resources to modify the software if the gamification method has not yet been proven effective. Or, sometimes a researcher is simply looking to conduct general research to see if the intervention has any usefulness at all before considering domain-specific scenarios.

When implementing the gamification intervention into pre-existing software is not feasible or is otherwise undesirable, digital media researchers can develop their own experimental testbed using a combination of game development technologies and research tools. This can be an attractive approach, enabling the development of a system that is completely under the researcher’s control, allowing for experimentally controlled conditions, and reuse across multiple experiments with minimal modification. In other words, the development of a carefully designed testbed facilitates experimental validity and reliability (as a result of higher levels of control) and efficiency (through reuse).

Figure 2: Medulla Video Game

Figure 2: Medulla Video Game

Despite the potential benefits of customizable gamification features, academic research has given little attention to this topic. This paper fills a current gap in the literature by examining the methodologies for developing and working with such a configurable game. It explores the process of developing, testing, evaluating, and implementing an experimental testbed game for the empirical evaluation of gamification interventions. It is important to note that the focus of this article is not to consider the specific data from such a testbed, which does not yet exist, but rather to explain the design decisions surrounding the creation of this product. However, to ground these ideas in a real world example, this article draws upon specific design observations taken from a real world example, Medulla (Fanfarelli and Vie 7), a gamification testbed game teaching brain structure and function. Medulla (Figure 2) is used to demonstrate the practical application of the design strategies proposed in this article.


2 Game Design for Testbeds

We follow Katie Salen and Eric Zimmerman’s definition of a video game as “a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome” (Salen and Zimmerman 80). We chose this interdisciplinary definition because it is carefully derived from eight other definitions, including those proposed by board game designers, writers, philosophers, computer game designers, and anthropologists. Salen and Zimmerman admit that such a definition may be problematic in certain circumstances, such as when describing puzzles or role playing games with no definite beginning or end. However, the definition suits the purposes of this research, which examines platform video games specifically. In platform video games, there are concrete objectives that lead to a quantifiable outcome. Objectives include the acquisition of token such as virtual coins, the accumulation of points, and, ultimately, the completion of each individual level. There are also sources of conflict (e.g., spikes, enemies, holes which must be leapt over, etc.) and rules that mediate character progress (e.g., the speed at which a character can move, or the maximum angle of incline at which they can climb a hill).

While many detailed and useful resources are devoted to the process of game design (e.g., those developed by Richard Rouse III, Chris Crawford, Jesse Schell, and Tracy Fullerton), the goal of this paper is not to explore game design at large, but rather to focus the analysis on how to construct one particular type of game (the platform adventure game) to serve as a controlled experimental testbed for examining gamification interventions. By limiting scope to platform adventure games, elements particular to other genres (e.g. multiplayer design, open sandbox-style quests, complex resource allocation mechanics) can be ignored. These genres should be examined in future research, perhaps after testbed designs for simpler games have already been established.

Accordingly, this article considers game design for a fairly straightforward genre – the platform adventure. Such a game offers gameplay experiences in which players can run, jump, collect items, engage with enemies, and solve puzzles. Underneath these surface mechanic requirements, however, is the additional need for the system to be supple enough to accept implementation of a research testbed that can apply a variety of gamification strategies as experimental manipulations during runtime. This malleability enables efficiency as a function of testbed reusability.

In order to create a game that supports a variety of gamification mechanisms, a testbed game must be flexible, enjoyable, and accessible:

  1. Flexible: the game must be designed flexibly, so that players can interact with it in many different ways. This speaks to the importance of nonlinearity in game design (Rouse III 119) and also provides opportunity to implement various gamification strategies (e.g., badges, narrative, competitive elements) to redirect participant play styles and then measure their ability to react to these changes.
  1. Enjoyable: the game needs to be enjoyable for players, so that they will continue playing through the game in the different ways afforded. This allows researchers to assess the impact of the gamification strategy on both overall time on task as well as behavioral redirection.
  1. Accessible: the game needs to be accessible to players of varying skill levels. The game must offer both sufficient challenge to advanced players and appropriate accessibility to novice players. However, variations designed for accessibility should not be meaningful in ways that affect the variables being measured. Otherwise, there is the potential to introduce experimental confounds.

While this article cannot serve as a full instructional guide to platform testbed game design, it aims to lay the foundation for such work. Given the importance of play time and player skill to our design requirements, these issues will be described in this paper. Following that, we examine the relationship of the testbed game to testing, experimentation, and data analysis. First, however, we discuss the reasons to use a 2D platformer action game as the basis for a testbed game.


3 Using a 2D Platformer Action Game

A 2D platformer action game is characterized by activities such as running, jumping, collecting, and fighting within a two-dimensional world (McDaniel and Fanfarelli 24). The focus on X and Y boundaries within this type of world is useful because it often speeds up development time. For instance, while three-dimensional games more accurately simulate real-life spaces, 3D spaces introduce more complexity into the development process due to the introduction of a z-axis. As a result, 3D models require more time to create than 2D sprites, just as physics in a 3D space take more time to perfect than 2D physics. Functionally, 2D development enables researchers to get their testbed up and running more quickly.

Additionally, player ability and prior experience are important factors in maintaining experimental control. 2D spaces hold another advantage here – they are simpler to navigate. Three-dimensional environments introduce another dimension of complexity to navigation, a complexity that can be overcome by the practice that novice gamers are lacking. In experimental conditions, where equality and control are desirable and necessary, individual differences in navigation expertise can be confounding factors. Two-dimensional environments simplify navigation across all participants. While some differences will inevitably exist, two-dimensional environments reduce the potential for navigation to become a frustrating experience by lowering the barrier of entry for efficient navigation.

Figure 3: VVVVVV By Terry Cavanagh -

Figure 3: VVVVVV By Terry Cavanagh –

Action platformer games are useful because the genre is generally well-known, due in part to the popularity of classic platforming games like Super Mario Bros. and more recent examples such as VVVVVV (Figure 3), and are otherwise easy to explain to the uninitiated. They also allow for a distinct degree of openness that lends itself well to a game’s reusability across multiple studies and gamification implementations. Thus, this type of game is recommended when building testbed games when non-expert players are planned to be included in the dataset. The recommendations that follow, in this paper, will assume that such a genre is being used.

The game examined with the proposed design recommendations fits this criterion. Medulla is a 2D platformer game that teaches brain structure and function information. It was developed to be a gamification testbed and has been used to assess the effects of inclusion or omission of digital badges and fantasy-based narrative (Fanfarelli iii). Medulla is referenced throughout this article in comparison to the proposed design recommendations to provide an example of real-world application. Now that the genre scope has been defined, we present the primary focus of this paper – considerations for the design of a useful testbed for gamification research, and built with experimental control in mind. Specifically, we will address the topics of play time, player skill, choice of game engine, game testing, and the subsequent experimentation and data analysis.


Testbed Design Considerations

1 Play Time


Figure 4. Play Time should be carefully considered in gamification research. More is not always better. Image captured via Valve’s Steam client.

Figure 4. Play Time should be carefully considered in gamification research. More is not always better. Image captured via Valve’s Steam client.


Play time, the manifestation of time on task in a gaming environment, has been widely used as an engagement metric in gaming studies (Xie, Antle, and Motamedi 192). Such work posits that a player who voluntarily plays one game for longer than another game probably finds the first game more engaging. With this in mind, voluntary play time, where participants can choose when to stop playing, can be used to discriminate between engagement levels of participants in two conditions using the same game – one which includes a gamification intervention, and one which does not. However, the game must be carefully designed in order to accomplish this purpose without introducing confounds. This is the topic for this section.

The ceiling effect, for instance, is a concern. A ceiling effect occurs when participant scores on a measure (e.g., play time) tend to cluster close to the maximum value obtainable on that measure (Austin and Brunner 97). If a game contains only enough content to support a maximum of five minutes of meaningful gameplay, such an effect is likely to occur. After all, even poorly designed games can foster a few minutes of gameplay, simply due to the novelty involved in playing a new game. This leaves little room for play time differences to exist between control group participants and those in an experimental group, no matter how engaging of an intervention is included.

Such a situation is problematic; by definition, variables that exhibit a ceiling effect are unable to accurately discriminate between participants with high and low scores. This is likely to confound the results obtained from any significance testing that may be conducted on the data, increasing the type II error rate. Consider a scenario where a true difference exists in play time between participants in a control condition (CC) and participants in an experimental condition (EC), where the intervention that is present in EC should promote higher play times. If CC participants are willing to play through all of the content, EC participants will never have the opportunity to play longer, even though the intervention should increase play time. Here, the test bed is responsible for the variable (playtime) values, with no room for the intervention to make an impact. To mitigate this problem, the game should be designed with enough content to offer variable lengths of playtime, depending on the participant’s engagement, motivation, or whatever other specific constructs are planned to be measured. To ensure this goal was met successfully, experimenters using platform games to collect data should examine means and minimum and maximum values for playtime. For example, Medulla had a mean play time of 48.50 minutes, and a range of 35-70 minutes (Fanfarelli 49). While this seems to indicate a reasonable range of play time, there does not seem to be any research defining how much content a game of this nature should offer for various experimental scenarios.

Game designers such as Richard Rouse III acknowledge that levels are often defined by the amount of gameplay action that “feels right” to the player before offering them a reprieve and another level (450). This anchors gameplay time guidelines to individual games and their presentation of material, even in more discretely bounded games such as 2D platformers. However, more precise parameters are necessary for understanding player behaviors in an empirical sense. Future research should examine this issue more closely to identify how much play time should be offered.

2 Player Skill

Designers should constantly remind themselves that not all players are equally skilled or experienced. In particular, novice players must be accounted for. Richard Rouse III refers to this as “protecting the newbies” (246). A player on a youth basketball team may enjoy herself when playing against her peers, but placing her in a serious one-on-one match against a professional player is unlikely to be an enjoyable experience. Similarly, it is easy to create a game that is unplayable by novices. People who are motivated and inclined to design or develop a game are probably experienced players themselves. As such, they may view difficulty through expert gamer-tinted glasses, designing a game that is appropriately challenging for them to complete, but impossible for a novice. This situation is problematic and degrades the integrity of the data collected from these players. If expert player-participants are enjoying a game with an appropriate level of difficulty while novice player-participants endure frustration as they fail to complete an impossible game, the game is delivering different experiences, mediated by participant individual differences.

In other words, the game becomes unintentionally biased toward expert players. Prior skill, then, becomes a confounding variable if impactful and uncontrolled. Random assignment, which is widely accepted as a key component in proper experimental design, can lead to an unequal dispersion of expertise, predisposing one condition to greater game success and a more appropriate skill-difficulty match. Consider an experiment where the experimental condition participants are playing a game with complex narrative elements, and the control participants are playing the same game void of complex narrative. Now consider that the distribution of expertise is unbalanced; the experimental condition is primarily composed of expert players, while the control condition is primarily composed of novices. If the experimental condition experiences a good skill-difficulty match, and the control condition does not, the experimental condition will almost certainly experience greater satisfaction, a trait that is highly correlated with both appropriate challenge (Malone 162) and engagement (Wefald and Downey 91). In other words, the integrity of the experiment is compromised, and the results are both invalid and unreliable.

When analyzing data from this experiment, a researcher may falsely attribute positive effects of engagement to the intervention, while in reality, expertise played a more critical role. Here, due to a level design that catered to expert players, skill level confounded the data, resulting in a spurious relationship. While this example describes the importance of creating a difficulty that novices can enjoy, experts still require attention; designing a level for novices, only, will leave expert players bored and disengaged (Nakamura and Csikszentmihalyi 92), still providing a harmfully meaningful distinction between gameplay experiences.

What is a designer to do when designing for one type of player will always create an insufficient experience for another? The solution is to design for both audiences as much as possible. For example, in order to conduct an experiment to test the effectiveness of the game’s ability to teach, Medulla was designed with all players in mind. As such, levels were designed with multiple paths that led to the same final destination. See Figure 5 below (taken from Medulla). This multi-path design strategy accommodates varying skill levels. One route is filled with challenging jumps, enemies, and obstacles that are difficult for a novice gamer to overcome. Choosing such a route is incentivized by containing extra rewards (e.g., power-ups, extra lives, coins), making it desirable for those who feel capable. Alternatively, the other route is more straightforward, with fewer- or easier-to-overcome obstacles – a path for novices. To maintain the player’s ability to choose, these paths should frequently intersect, allowing a player to decide if they want to try a more difficult path or switch to an easier path. In this way, all players have a route that is enjoyable to overcome. They are free to choose the path they want and there is a reduced risk of player skill confounding the results.

Figure 1: Two paths lead to the same ending. The upper path requires jumping, moving platforms (1), and avoiding retracting spikes (2), but offers extra points (white orbs). The lower path is much simpler. Players can choose a path to suit their skill level.

Figure 5: Two paths lead to the same ending. The upper path requires jumping, moving platforms (1), and avoiding retracting spikes (2), but offers extra points (white orbs). The lower path is much simpler. Players can choose a path to suit their skill level.


3 Choice of Game Engine


Figure 6: GameMaker Studio, ,

Figure 6: GameMaker Studio,


While the content of this paper is applicable to any game engine, researchers looking to begin development for the first time may be wondering where to start. Simpler-to-use engines, such as GameMaker, are most useful for individuals looking to create a simple game without investing too much time. GameMaker (Figure 6) is freely available and allows those without programming experience to quickly get a game up and running.

Figure 7: Unity Game Engine,

Figure 7: Unity Game Engine,

For those desiring more control over their game, Unity may be a better option, especially if the researcher has previous coding experience. While the interface is more complex, Unity is a powerful and flexible environment that has been used to make games at all levels of development, from small indie projects to large multimillion dollar productions. It can be used for both 2D and 3D development. This software is well-documented (Ryan Creighton’s book Unity 3D Game Development by Example is one of many examples) and popular in game development, with the ability to create one game and then export it to many different platforms (e.g. Windows, Mac, Web, Mobile). It has also been successfully used in a wide range of other research contexts, such as the study of urban design (Indraprasta and Shinozaki 2), virtual exhibitions (Ni, Peng, Lina, and Jing 36), virtual reality (Wang et al. 1), and even in agriculture simulation (Hong, Qin, and Dehai 49). As a final consideration, using a game engine such as Unity that can natively create extra-game files is useful. The ability to log gameplay data and export it to a reusable file (e.g., .csv files) improves the utility of the game as a research tool by reducing the time and effort required to manually log this information (Figure 2).


Figure 2: CSV Export of Key Player Data from Medulla

Figure 8: CSV Export of Key Player Data from Medulla

4 Testing

Testing is necessary with any game-based project, experimental or otherwise, and playtesting is essential to obtain systematic and quantitative information about consumers’ perceptions of games (Davis, Steury, and Pagulayan, para 3). In fact, playtesting is perhaps the most important part of game design. It identifies faults in the core game mechanics (Rouse III 484). Designs frequently do not work as well in practice as they did in theory. After all, a defining feature of games is interactivity (Cepada para 1). Until a player actually interacts with the game, the match between a developed functionality or feature and the designer’s intent is difficult to anticipate. Additionally, limitations in computer hardware, the specific game engine being used, or even in a developer’s skillset may preclude the implementation of some designed functions.

Bugs are also an ever-present danger, threatening to not only break immersion or the gameplay experience, but also to introduce unexpected inconsistency in play that could confound the data, especially if some players encounter them and others do not. Playtesting can be considered the design equivalent of bug fixing (Rouse III 484) and should be applied continuously as the game evolves. Blatant bugs that impede gameplay can become frustrating for players, potentially affecting positive gains that an intervention may be trying to stimulate. To manage these risks, the designed game and gamification mechanisms should be tested thoroughly.

Games require testing throughout the entire development process. Good games benefit from the use of iterative design methods, where they undergo a continuous cycle of design, development, and testing phases until a satisfactory game is achieved (Salen and Zimmerman 11; Rouse III 292). Playtesting, then, becomes a frequently occurring informal process. This ensures that more complexity is not added on to an already ill-functioning system. Bug identification is also more efficient in a simple system than one with complexity – needles are more easily found among a few pieces of hay, than in an entire haystack. In other words, spending more time testing increases the process’s efficiency, reducing the time spent isolating problems, identifying fixes, and implementing them.

In addition to iterative testing, a final phase of testing should occur before the game is introduced to participants. This phase should include greater rigor than previous instances of playtesting. Testers associated with the game’s development or professional testers should attempt to play the game in varied ways, taking care to thoroughly explore the environment with their avatar. As they uncover gameplay bugs, testers should enter them into a detailed log that can be given to developers for patching. If non-developer players have not yet served as playtesters, it is important to involve them at this final stage of testing. Players will often play games in ways that were not anticipated by the development team. As such, it is likely that they will identify bugs that were not caught by previous testers. In Medulla, several bugs were found by non-developer players. Even after the developer believed he had cleared the game of all bugs, playtesters found several spots where they could fall through the level, get stuck, disappear, shift immovable objects, or otherwise break the game.

Likewise, players of varied skill levels should also participate in testing in order to account for the differing ways in which novices and experts may approach the game. Here, the skill-difficulty match that was described earlier in this paper should be a focal point. A novice may repeatedly miss a jump that an expert would make with ease, uncovering a missing collider or trigger that causes the avatar to fall endlessly through the level. This is also a good time to make a final check on the difficulty of the game, ensuring that both novices and experts are capable of completing and enjoying the game.


5 Experimentation

Once testing concludes and the game is deemed sufficient for release, experimentation can begin. Regardless of one’s procedure and specific intervention, experiments that use a freshly developed game require special attention. No matter how thoroughly the game was tested, there may still be bugs that can confound the experiment. Experimenters should carefully review the gameplay to make sure that participants are not encountering any undesirable play experiences that result from undiscovered bugs. This should be conducted non-invasively, so as not to affect the participant’s gameplay. This can be done in a number of ways. For example, video capture software can be used to capture the gameplay experience for later review, the monitor’s output can be cloned on another monitor away from the player’s view, or the experimenter can watch from afar, being careful not to make participants feel uneasy from an observing eye. Remaining non-intrusive is important in order to avoid the Hawthorne Effect – the tendency for participant behavior to change when they are being watched (Payne and Payne 107).

Figure 3. Experimenter's Log for Medulla

Figure 9. Experimenter’s Log for Medulla

Experimenters should keep a log to record any bugs encountered by participants (see Figure 9 for an example). This log enables the experimenter to record bugs and associate them with participant numbers or other anonymous identification methods in order to facilitate analysis at the end of experimentation. When a bug is recorded, the experimenter must be as specific as possible, listing the location, the player action that caused it (if possible), and to what extent it altered the participant’s experience or impeded gameplay (e.g., small graphical glitch that did not seem to impede play vs. a situation in which a participant was unable to progress and the level required restarting). When a small number of bugs are identified during the experimentation process, bugs should be given identification codes so that data recorded for participants who experienced them can be isolated and examined. If a large number of bugs are being identified, it may be worthwhile to revert to the development phase and resume bug fixing in order to preserve the integrity of the data. Participant experiences should be consistent and an overwhelming number of impactful bugs will likely lead to unreliable data.

On a final note, researchers should consider including more participants than is deemed necessary by a power analysis, a technique used to determine the minimum number of participants needed for statistical significance. During data analysis, data for participants who encountered disruptive bugs may be deemed unreliable and excluded from the dataset. Additional participant data may also be useful during data analysis to statistically determine which bugs impacted gameplay to the extent that the data should not be used, a process that will be described in the data analysis section.


6 Data Analysis

The detailed bugs recorded during experimentation will prove useful during data analysis. For instance, if unexpected or unusual patterns are identified in the data, the researcher can see if participants exhibiting these patterns are also participants who have encountered a particular bug. Additionally, significance testing can be conducted between participants who have experienced a particular bug and those who have not, from the same condition. Such testing will identify, with reasonable certainty, if the bug did or did not have an effect on the variables being examined. If no significant difference is found between the two groups, the researcher can proceed with data analysis with greater confidence in the integrity of the results. However, a statement should still be made in any corresponding publications noting the limitations of the game and the possible effects of bugs.

If significant differences are found, one can reasonably conclude that the bugs confounded the study. In this case, the data should not be used. One may argue that if enough participants have bug-free data to compose a sample size that is sufficient for statistical analysis, the confounded data points can simply be discarded, and analyses can be run on the “clean” data. Such a conclusion is dangerous. Individual differences may have been responsible for play style differences that either did or did not lead them to a bug. In this case, discarded data would be representative of an entire gamer subset, making the results and any conclusions representative of only the remaining subset. This is a situation which will likely make the data unreliable and the experiment non-replicable. Instead, the bugs should be fixed and the experiment should be conducted again.



In conclusion, we make the following recommendations:

  1. Avoid the ceiling effect: If you intend to use play time as a metric, be sure you have enough content to support the greatest amount of play time in which your participants may engage.
  1. Support players of all skill levels: Consider including multiple paths with varying difficulties so novices can choose easier paths, and experts can choose more difficult paths that better support their skill level.
  1. Be deliberate in testbed selection: Each engine has its strengths and weaknesses. Choose wisely.
  1. Test frequently: Employ frequent iteration to identify bugs, reduce the chances of expanding upon an ill-functioning system which makes it more difficult to identify issues, or wasting time on a feature that is not performing its intended function.
  1. Beware the Hawthorne Effect: Be aware of the chance that the participant’s behavior may be modified if they feel they are being observed. Conduct observations through video capture software, a cloned monitor output, or direct observation from afar.
  1. Handle bugs and incomplete datasets appropriately: Keep a bug log during experimentation to identify which participants have encountered bugs, and which bugs they encountered. Remember to carefully choose whether or not to exclude participants, remaining aware of the possibility that individual differences may be responsible for these issues. Include more participants than deemed necessary by a power analysis, in case some of the data must be excluded.

While this paper details some of the challenges and considerations inherent in developing a testbed game for the evaluation of gamification interventions, overcoming these challenges results in a game that provides the level of control and reusability that empower researchers during empirical testing. Game testbeds can then provide experiences that enable the testing of gamification interventions in either engaging or non-engaging environments, depending on the anticipated use of the intervention and the specific design of the game.

This paper presents an early theoretical examination of the design of such a game, but this topic still needs more research. While this paper integrates the principles of good research design with the game design, development, testing, and implementation process, future research on this process should consider this paper as more of a framework, with individual components that require a deeper level of analysis (e.g., how much content is required to avoid the ceiling effect?). As this process is refined, it will enable more efficient investigation of gamification interventions that cannot be tested within their intended software environments, providing testing platforms that afford a deep level of experimental control and reusability across multiple experiments and ultimately breaking the gamification experimentation barrier.


Works Cited

Austin, Peter C. and Lawrence J. Brunner. “Type I Error Inflation in the Presence of a Ceiling Effect.” The American Statistician 57.2 (2003): 97-104. Web. Jan. 2016.

Cepada, René Alberto García. “Reconciling Art History and Video Games.” The Journal of Digital Media Arts and Practice (2015). Web. Nov. 2015.

Creighton, Ryan H. Unity 3D Game Development by Example: A Seat-of-Your-Pants Manual for Building Fun, Groovy Little Games Quickly. Birmingham, UK: Packt Publishing Ltd, 2010. Print.

Crawford, Chris. On Game Design. Berkeley, CA: New Riders Games, 2003. Print.

Davis, John P., Keith Stury, and Randy Pagulayan. “A Survey Method for Assessing Perceptions of a Game: The Consumer Playtest in Game Design.” Game Studies 5.1 (2005). Web. Available:

Fanfarelli, Joseph R. “The Effects of Narrative and Achievements on Learning in a 2D Platformer Video Game.” Diss. University of Central Florida, 2014. Web. Available:

Fanfarelli. Joseph and Stephanie Vie. “Medulla: A 2D Sidescrolling Platformer Game that Teaches Basic Brain Structure and Function.” Well Played, 4.2 (2015): 7-29. Web. Available:

Fullerton, Tracy. Game Design Workshop: A Playcentric Approach to Creating Innovative Games. Amsterdam: Elsevier, 2008. Print.

Garney, Ben, and Eric Preisz. Video Game Optimization. Australia: Cengage Learning. Print.

Hamari, Juho, Jonna Koivisto, and Harri Sarsa. “Does Gamification Work? – A Literature Review of Empirical Studies on Gamification.” Proceedings of the 47th Hawaii International Conference on System Sciences, Hilton Waikoloa, HI, 2014. 3025-3034. Web. Jan. 2016. Available:

Hong, Chen, Ma Qin, and Zhu Dehai. “Research of Interactive Virtual Agriculture Simulation Platform Based on Unity3d,” Journal of Agricultural Mechanization Research, 3 (2012): 49. Web. Jan. 2016.

Indraprastha, Aswin and Michihiko Shinozaki, “The Investigation on Using Unity3D Game Engine in Urban Design Study.” Journal of ICT Research and Applications 3.1. (2009): 1-18. Web. Jan. 2016. Available:

Kapp, Karl M. The Gamification of Learning and Instruction: Game-Based Methods and Strategies for Training and Education. San Francisco, CA: Pfeiffer, 2012. Print.

Landwehr, Brian. “Big Games: One Company’s Experience with Gamification of Health.” Games for Health Journal 3.2 (2014): 64-66. Web. Jan. 2016.

Malone, Thomas W. What Makes Things Fun to Learn? A Study of Intrinsically Motivating Computer Games. Palo Alto, CA: Xerox PARC. Aug. 1980. Web. Jan. 2016.

McDaniel, Rudy and Fanfarelli, Joseph R. “Rhythm and Cues: Project Management Tactics for UX in Game Design,” International Journal of Sociotechnology and Knowledge Development, 7.3 (2015): 20-37. Print.

Ni, Lebo, Qi Peng, Yu Lina, and Wang Jing, “The Research and Application of Products Virtual Exhibition Technology Base on Unity 3D.” Digital Technology and Application 9 (2010): 36. Web. Jan. 2016.

Nakamura, Jeanne and Mihaly Csikszentmihalyi. “The Concept of Flow” Handbook of Positive Psychology. Ed. C.R. Snyder and Shane J. Lopez. New York: Oxford University Press, 2001. 89-105. Web. Jan. 2016.

Payne, Geoff and Judy Payne. Key Concepts in Social Research. London, UK: Sage, 2004. Print.

Rouse III, Richard. Game Design Theory and Practice, 2nd ed. Plano, TX: Wordware Publishing, 2005. Print.

Salen, Katie and Eric Zimmerman. Rules of Play: Game Design Fundamentals. Cambridge, MA: The MIT Press, 2004. Print.

Schell, Jesse. The Art of Game Design: A Book of Lenses, Burlington, MA: Morgan Kaufmann, 2008. Print.

Wang, Sa, Zhengli Mao, Changhai. Zeng, Huili Gong, Shanshan Li, and Beibei Chen. “A New Method of Virtual Reality Based on Unity3D.” Proceedings of the 18th International Conference on Geoinformatics, Beijing, China, 2010. 1-5. Web. Jan. 2016.

Wefald, Andrew J. and Ronald G. Downey, “Construct Dimensionality of Engagement and its Relation with Satisfaction.” The Journal of Psychology 143.1 (2009): 91-111. Web. Jan. 2016.

Xie, Leslie, Alissa N. Antle, and Nima Motamedi. “Are Tangibles More Fun? Comparing Children’s Enjoyment and Engagement Using Physical Graphical and Tangible User Interfaces.” Proceedings of the Conference on Tangible and Embedded Interaction, Bonn, Germany, 2008. 191-198. Web. Jan 2016.







Leave a Reply

Your email address will not be published. Required fields are marked *