Methodology

Target Population

The target population for this study consisted of third-year students, ages generally between 18-20, enrolled at a public English-language CEGEP in Quebec, Canada. Participants were drawn from two sections of a Game Programming course offered within the Computer Science (CS) program at John Abbott College. These students were nearing the completion of their DEC (Diplôme d'études collégiales) and had completed multiple programming courses, making them well-positioned to perform more expert-level software development practices. They were selected based on their advanced programming experience and the increased importance of Peer Code Review (PCR) skills at this stage of the curriculum. Notably, these students were preparing to enter the workforce through a mandatory internship (stage) in the following semester, where collaboration, feedback, and code quality practices are emphasized by industry partners. As such, improving their ability to give and receive meaningful code feedback was both pedagogically timely and professionally relevant.

Research Design

This study employed a mixed-methods, quasi-experimental design to investigate the following research questions:

RQ1: Does a game-based learning intervention increase the quality of feedback provided during Computer Science peer code review?
RQ2: Does the game-based learning intervention influence students' perceived competence, autonomy, and relatedness, as conceptualized by Self-Determination Theory?

A total of 42 students completed the pre-test. Due to absenteeism, 39 students completed the post-test and were included in the motivation analysis, while 37 students who completed both pre- and post- peer feedback activities were included in the analysis of feedback quality. Students were excluded from the feedback analysis if they had not participated in both the pre- and post-intervention activities. As a result, sample sizes vary slightly across different aspects of the analysis.

Data collection took place during Week 10 (pre-test) and Week 12 (post-test) of the Fall 2024 semester, allowing for an assessment of changes in student motivation and feedback quality over time. A pre-test/post-test design was chosen to assess within-subject changes in student motivation and feedback quality following the introduction of the GBL intervention. This design allowed students to serve as their own comparison, measuring improvements relative to their baseline performance.

Procedure

The sequence of activities in this study is visually represented in [Figure 2].

Figure 2

Intervention Sequence Diagram

sequenceDiagram
    %% Define Actors
    actor Student
    actor Instructor
    participant Moodle
    participant Game

    rect rgb(180, 190, 254) # Light blue for Pre-Intervention (Async)
    Note over Student, Game: Pre-Intervention (Asynchronous, Week 9)
    Student->>Moodle: Submits peer feedback
    Instructor->>Moodle: Scrapes & scores feedback
    Moodle->>Instructor: Provides feedback quality scores
    Instructor->>Game: Assigns yellow action cards based on feedback quality
    end

    rect rgb(166, 227, 161) # Light green for Class Session 1 (Sync)
    Note over Student, Game: Intervention (Synchronous, Week 10)
    Instructor->>Student: Informed consent & Pre-test survey
    Instructor->>Student: Explains game rules & hands out cards
    Student->>Game: Plays first game session
    Instructor->>Student: Reveals feedback-based card distribution
    end

    rect rgb(249, 226, 175) # Light yellow for Post-Intervention (Async)
    Note over Student, Game: Intervention (Asynchronous, Week 11)
    Student->>Moodle: Submits second peer feedback
    Instructor->>Moodle: Scores updated feedback
    Moodle->>Instructor: Provides updated scores
    Instructor->>Game: Assigns new yellow action cards
    end

    rect rgb(243, 139, 168) # Light red for Class Session 2 (Sync)
    Note over Student, Game: Post-Intervention (Synchronous, Week 12)
    Instructor->>Student: Distributes updated game cards
    Student->>Game: Plays second game session
    Instructor->>Student: Post-test survey
    end

Note. This diagram visualizes the chronological sequence of events in the study across four key phases. Time progresses from top to bottom. The entities on the top and bottom represent the roles or systems involved: Student, Instructor, Moodle (an online learning management system), and Game (the card-based peer feedback intervention). The arrows represent direct actions (e.g., submitting feedback or handing out game cards). Each coloured section represents a week in the semester, distinguishing asynchronous phases (done outside of class time) and synchronous phases (conducted during scheduled class time).

Pre-Intervention Phase

Prior to this study, students had been engaging in traditional peer feedback activities since Week 4 of the semester, using the PCR Rubric (Appendix A) as a reference for evaluating their peers' work. This rubric provided a structured framework that guided their feedback, ensuring consistency and clarity in their evaluations. These prior experiences with peer review helped establish a baseline understanding of feedback expectations before the intervention was introduced.

Prior to the intervention, students participated in asynchronous peer feedback through the Moodle Learning Management System's (LMS) Workshop activity (Moodle, 2024). Each student provided feedback to three peers, and this feedback was extracted using a custom scraper (Appendix E) developed by the author. The extracted feedback was anonymized and analyzed using a Large Language Model (LLM) (OpenAI, 2024), which categorized comments based on a Code Review Taxonomy (Appendix B). The taxonomy classifies feedback into distinct categories based on specificity and constructiveness, such as "SA" (Specific Actionable), "G+" (General Positive), or "G0" (General Neutral).

To guide the LLM's classification, a few-shot approach (Appendix G) was used, in which the model was provided with a small number of labeled examples to infer how to apply the taxonomy to new comments. This strategy allows LLMs to generalize effectively without extensive training data (Anglin & Ventura, 2024). To verify in the LLM's classification, a subset of outputs was manually reviewed by the author. During this process, the prompting strategy and card-distribution scripts (Appendix E) were refined iteratively to improve classification consistency. While this verification process was informal and not independently validated, the reviewed samples showed a high level of agreement with the intended taxonomy categories, suggesting the LLM output was sufficiently reliable for the purposes of this exploratory study.

To quantify the quality of feedback for analysis, each taxonomy category was assigned a numerical score using a predefined conversion system (Table 1). These scores were then used to determine the number of cards received at the start of the game, introducing a performance-based starting condition for the intervention.

Table 1

Numerical Conversion of Feedback Quality Scores

Code	Description	Score
SA	Specific Actionable	5
S+/S-	Specific Positive/Negative	4
S0	Specific Neutral	3
G+/G-/GA	General Positive/Negative/Advice	2
G0/PV	General Neutral/Placeholder Value	1
OT	Off-topic/Irrelevant	0

Each student provided feedback to three peers, and the median of these three numerical scores was used as their individual feedback quality score in statistical analysis. The median was chosen to reduce the influence of outliers or inconsistencies in individual comments, providing a more robust measure of typical feedback quality for each student.

To ensure that the game could be reasonably completed within a class session, a simulation was developed (Appendix E) to play 1,000 rounds of the game under varying conditions. The results indicated that the average game lasted 13 turns, with the longest game reaching 24 turns. In terms of duration, the simulation estimated an average game time of 19 minutes, with the longest recorded game taking 35 minutes. These findings informed the game design parameters, such as the number of starting resources and the inclusion of time-limiting mechanics to maintain feasibility within the allotted class period.

Intervention Phase

During a synchronous class session, students first completed the informed consent form (Appendix C), followed by a pre-test (Appendix D) that measured their perceived autonomy, competence, and relatedness in relation to peer feedback, along with baseline questions about their gaming habits and attitudes. They were then placed into groups of four and received physical card decks for gameplay. The instructor displayed a table assigning yellow action cards to each student, prompting their curiosity about the distribution.

Students played the card game (Appendix F) under standard conditions, engaging with mechanics centred on resource collection, strategic decision-making, and competition. Although peer feedback was not a direct action within the game, it was embedded in the game structure: students' starting resources (yellow action cards) were determined by the quality of their feedback in the previous peer review activity. Each student's feedback was analyzed and scored using a code review taxonomy (Appendix B), and their score was used to assign an initial advantage in the game. Since the course was Game Programming, the game's entities (e.g., State Machine, Timer, Collision, Sprite) were drawn from foundational development concepts covered in class, enhancing topical relevance and familiarity. This feedback-performance link was revealed after the first game session during a debriefing, when students were shown how their starting cards were derived from their peer feedback scores. This design choice created a delayed but meaningful incentive for quality feedback, connecting academic effort to in-game success.

Post-Intervention Phase

Following the first game session, students completed another asynchronous peer feedback activity through the Moodle LMS, knowing that their feedback quality would impact their performance advantages in a future game session. The second iteration of the game followed the same structure as the first, with students receiving yellow action cards based on their new feedback quality scores. After playing the game for the second time, students completed the post-test survey (Appendix D), measuring changes in their perceptions of competence, autonomy, and relatedness in relation to peer feedback, along with two open-ended questions to solicit suggestions about improved game mechanics and any comments about the game influencing their motivation.

Instruments

Code Review Taxonomy (RQ1)

The Code Review Taxonomy (Appendix B) was used to operationalize the concept of feedback quality for RQ1, which asked whether the GBL intervention improved the quality of PCR. This taxonomy categorized feedback comments into distinct types (Hamer, Purchase, Luxton-Reilly, & Denny, 2015; Indriasari, Denny, Lottridge, & Luxton-Reilly, 2023). Feedback was classified as either positive or negative, depending on whether it reinforced correct code implementation or identified issues. Additionally, comments were categorized based on whether they provided actionable advice or suggestions for improvement. The taxonomy also distinguished between general feedback (addressing broader coding concepts) and code-specific feedback (focusing on particular lines of code or implementation details). These categories provided a structured framework for analyzing feedback quality.

While no formal psychometric validation (e.g., inter-rater reliability or construct validity) is reported for this taxonomy, it has been used in multiple studies in computing education to analyze the quality of peer code review comments. Indriasari et al. (2023) adopted the taxonomy from Hamer et al. (2015), noting that it aligns with characteristics of effective written feedback outlined in broader feedback literature, such as specificity, constructive suggestions, and reinforcement of strengths (Gehringer, 2017; Voelkel, Varga-Atkins, & Mello, 2020). This alignment with pedagogical goals supports its use as a practical framework for categorizing feedback in this context.

Intrinsic Motivation Inventory (RQ2)

The Intrinsic Motivation Inventory (IMI) was used to address RQ2, which focused on whether the intervention influenced students' motivation as conceptualized by SDT. The IMI is a validated Likert-style survey that assesses SDT sub-scales for competence, autonomy, relatedness (Ryan, Mims, & Koestner, 1983). It utilizes a 5-point scale (1 = not at all true to 5 = very true). Survey questions were adapted to reflect the PCR experience with the full list of pre-test and post-test questions included in Appendix D. For example, competence-related questions asked whether students thought their feedback was useful to others. Autonomy-related questions asked students whether they felt they had choices in how they provided peer feedback or whether they had input in deciding how to evaluate their peers' work. Relatedness was assessed through questions that explored whether students felt connected to their peers during the peer review process and whether they felt comfortable giving feedback.

The IMI has demonstrated strong validity and internal consistency across multiple domains (McAuley, Duncan, & Tammen, 1989). The SDT research community recognizes that minor wording adjustments and even shorter versions can be used without compromising reliability (Self-Determination Theory, n.d.). This flexibility makes the IMI particularly well-suited to educational contexts like this one, where survey fatigue and contextual relevance are concerns.

Data Analysis

Data analysis was organized around the two research questions, each targeting a distinct dependent variable. The independent variable was the implementation of the game-based learning intervention, specifically, the peer feedback card game played by the students in Weeks 10 and 12.

To address RQ1, which asked whether the intervention improved the quality of peer feedback, the dependent variable was students' feedback quality scores. Each student provided feedback to three peers in both the pre- and post-intervention phases. To account for variability across different peer reviews, the median feedback quality score from each student's three evaluations was used for the analysis. Because these scores were ordinal, the Wilcoxon Signed-Rank Test was used to assess pre-post differences.

To address RQ2, which investigated whether the intervention influenced students' perceived competence, autonomy, and relatedness, the dependent variables were the sub-scale scores from the adapted IMI. Independent t-tests were conducted on the mean scores for each sub-scale, as the pre- and post-tests were completed anonymously and thus could not be paired. Perceived autonomy was measured using items Q5, Q6, Q8, and Q9; however, Q6 was excluded from analysis due to ambiguous wording, and Q9 was reverse-scored. Perceived competence was measured using Q2, Q3, and Q4, while relatedness was measured using Q1 and Q7.

A significance level of $α = .05$ was used for all inferential statistical analyses. Descriptive statistics, including mean and median scores, were also calculated for each variable to illustrate overall trends in student motivation and feedback quality over time.

In addition to quantitative data, students' open-ended responses from the post-test survey were analyzed using thematic coding. Responses were reviewed inductively to identify emergent themes related to students' motivation, perceptions of the game's mechanics, and suggestions for its improvement. This qualitative data supported interpretation of the quantitative results and helped contextualize student experiences during the intervention.

Ethical Considerations

This study received ethical approval from both the Université de Sherbrooke (Appendix H) on April 16, 2024 and John Abbott College (Appendix I) on May 14, 2024. The researcher also completed the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS 2: CORE 2022) training (Appendix J) on March 23, 2024, which certifies adherence to Canadian standards for research ethics. Data collection took place during Weeks 9 to 12 of the Fall 2024 semester. The results were analyzed during the Winter 2025 term, and the thesis was written in parallel to complete the requirements for submission to the Université de Sherbrooke by Spring 2025. This schedule ensured that all research activities were conducted within the approved ethical review period.

The researcher's dual role as both instructor and investigator raised potential concerns regarding coercion and power dynamics. To mitigate this, explicit informed consent was obtained (Appendix C), and students were informed that participation was entirely voluntary and would not affect their grades. They had the option to withdraw at any time without penalty. Students were given a clear explanation of the study's purpose, procedures, potential risks and benefits, as well as methods of data collection and use, thereby ensuring informed decision-making.

Anonymity and confidentiality were maintained throughout the process. Pre- and post-test survey responses were collected anonymously to protect students' motivational data. Feedback quality data, however, was linked to individual students to enable the intervention's game mechanic; in these cases, only the researcher had access to identifiable data. Before analysis, all peer feedback was anonymized and scrubbed of identifying details. All data were stored securely on Canadian servers via the Moodle LMS and Microsoft Forms.

The dissemination of findings poses minimal risk to participants. All results are reported in aggregate or anonymized form to ensure that individual students cannot be identified. No quotations or specific feedback samples are attributed to individual students. Findings will be shared through academic presentations, conferences, journals, and the final thesis submission, with no foreseeable negative impact on participants.

References

Anglin, K. L., & Ventura, C. (2024). Automatic text classification with large language models: A review of openai for zero- and few-shot classification. Journal of Educational and Behavioral Statistics, 10769986241279927. https://doi.org/10.3102/10769986241279927

Gehringer, E. (2017). Helping students to provide effective peer feedback. 2017 ASEE Annual Conference & Exposition Proceedings, 28434. Columbus, Ohio: ASEE Conferences. https://doi.org/10.18260/1-2--28434

Hamer, J., Purchase, H., Luxton-Reilly, A., & Denny, P. (2015). A comparison of peer and tutor feedback. Assessment & Evaluation in Higher Education, 40(1), 151–164. https://doi.org/10.1080/02602938.2014.893418

Indriasari, T. D., Denny, P., Lottridge, D., & Luxton-Reilly, A. (2023). Gamification improves the quality of student peer code review. Computer Science Education, 33(3), 458–482. https://doi.org/10.1080/08993408.2022.2124094

McAuley, E., Duncan, T., & Tammen, V. V. (1989). Psychometric properties of the intrinsic motivation inventory in a competitive sport setting: A confirmatory factor analysis. Research Quarterly for Exercise and Sport, 60(1), 48–58. https://doi.org/10.1080/02701367.1989.10607413

Moodle. (2024). Moodle learning management system [Moodle Pty Ltd.].

OpenAI. (2024, October). ChatGPT [OpenAI].

Ryan, R. M., Mims, V., & Koestner, R. (1983). Relation of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45(4), 736–750. https://doi.org/10.1037/0022-3514.45.4.736

Self-Determination Theory. (n.d.). Intrinsic motivation inventory [Https://selfdeterminationtheory.org/intrinsic-motivation-inventory/].

Voelkel, S., Varga-Atkins, T., & Mello, L. V. (2020). Students tell us what good written feedback looks like. FEBS Open Bio, 10(5), 692–706. https://doi.org/10.1002/2211-5463.12841