Research Proposal

ABSTRACT

LIST OF TABLES

LIST OF FIGURES

LIST OF ACRONYMS

CS: Computer Science
PCR: Peer Code Review
GBL: Games-based Learning
SDT: Self-Determination Theory
IMI: Intrinsic Motivation Inventory
CRT: Code Review Taxonomy

CHAPTER 1: PROBLEM STATEMENT

A fundamental part of professional software development (Li, 2006), Peer Code Review (PCR) involves developers evaluating each other's code based on style guides and best practices. These reviews often focus on aspects such as naming conventions, function scope, spacing, and documentation, and typically lead to a back-and-forth dialogue aimed at improving code quality. PCR is widely adopted in the industry as a key quality assurance practice, and educational research suggests it can also support student learning by encouraging reflection, collaboration, and analytical thinking (Powell & Kalina, 2009; Race, 2001).

Despite these pedagogical benefits, a common challenge in Computer Science (CS) education is that students are often unmotivated to provide high-quality peer feedback. This lack of motivation can stem from time constraints, unclear incentives, or uncertainty about the value of the review process (Indriasari, Luxton-Reilly, & Denny, 2021).

Many students experience PCR as a task done out of obligation rather than personal interest. They may view it as a hoop to jump through rather than an opportunity for learning, especially when peer feedback activities are tied to marks or framed primarily as accountability tools (Falchikov, 2013). When students lack meaningful choice or understanding of the activity's purpose, they tend to engage at a surface level, writing generic or rushed comments that do little to support learning (Pintrich, 2003; Ramsden, 2003). In college contexts such as CEGEP, this is further complicated by systemic pressures like the R-score, which ranks students relative to their peers and amplifies external motivators (Dagres, 2017).

Students may also hesitate to provide detailed or critical feedback because they feel unqualified to evaluate a peer's work (Falchikov, 2013). This is especially true in technical domains like programming, where skill gaps between students can be significant and perceived expertise carries social weight (Perez-Quinones & Turner, 2009). Lacking confidence in their own abilities, some students resort to vague praise or neutral observations rather than offering concrete suggestions for improvement. The theory of self-efficacy underscores the importance of perceived competence in determining effort and persistence in learning tasks (Bandura, 2012). Without support to develop feedback literacy, students may miss opportunities to learn from the review process themselves (Indriasari, Luxton-Reilly, & Denny, 2020; Petersen & Zingaro, 2018).

Finally, PCR can feel disconnected and impersonal, especially when carried out anonymously or asynchronously. Without visible social cues or shared norms, students may worry that their feedback could be misinterpreted or cause tension with classmates (Falchikov, 2013). This fear can lead to overly cautious comments or avoidance altogether, weakening the collaborative potential of the activity (Powell & Kalina, 2009). When students do not feel a sense of community or shared responsibility, the peer review process risks becoming transactional and isolated (Indriasari, Denny, Lottridge, & Luxton-Reilly, 2023). Building peer trust and social presence is therefore essential to creating a classroom environment where feedback is both valued and effective.

While logistical and interpersonal challenges can also impact the effectiveness of PCR (Falchikov, 2013; Indriasari, Denny, Lottridge, & Luxton-Reilly, 2023), the motivational barriers described above remain particularly challenging in traditional peer review settings. Therefore, what has been unveiled is a need to explore alternative instructional strategies that can better support these psychological needs and improve efficacy within the feedback process in CS education.

CHAPTER 2: CONCEPTUAL FRAMEWORK

Providing effective code review feedback is a fundamental skill for Computer Science (CS) students as they prepare to enter the workforce (Sadowski, Söderberg, Church, Sipko, & Bacchelli, 2018). From my experience as a professional software developer in the industry, I can attest to how important Peer Code Review (PCR) is for programmers. From my experience as a CS student, I know that traditional academic approaches do not always motivate learners, especially for the PCR process. Superficial feedback benefits neither the reviewer nor the reviewee and does little to improve code quality or encourage deep learning (Ramsden, 2003). It is important to create an environment where feedback is constructive and empowers students as part of the development process (Hattie & Timperley, 2007). This is particularly beneficial for developing essential feedback skills in professional software developers. As a CS teacher, PCR sessions often reveal a lack of student motivation, reflected in feedback that tends to be brief, vague, or lacking in constructive value.

Low motivation during PCR in the classroom poses a significant challenge for educators aiming to maximize the effectiveness of PCR practices. As such, Self-Determination Theory (SDT) is a meta-theory of human motivation and personality, grounded in psychological science, that provides a potential lens for understanding this phenomenon by highlighting the of three fundamental psychological needs (competence, autonomy, relatedness) for intrinsic motivation (Deci & Ryan, 1985, 1994). These terms are defined as: competence, where students may doubt their ability to provide valuable feedback or feel that the focus is solely on error-finding or quality-assurance testing; autonomy, where limited choices in how to conduct PCR (code to review, feedback format, etc.) may stifle student ownership; and relatedness, where a lack of community focus or a shared sense of purpose can diminish the feeling that PCR is a collaborative improvement process. Traditional PCR approaches may fail to adequately support these needs.

Game-Based Learning (GBL) offers a promising approach to address these motivational barriers hindering effective PCR. GBL prioritizes immersion, challenge, and (sometimes) social interaction (Papastergiou, 2009). These elements have the potential to: enhance competence, where well-designed challenges and in-game rewards can build confidence as coding proficiency increases; foster autonomy, where GBL systems can offer choices within a structured learning experience, increasing student agency; and promote relatedness, where narrative and collaborative gameplay can make PCR feel more purposeful and community-oriented (Proulx, Romero, & Arnab, 2017; Uysal & Yildirim, 2016).

Students learn more effectively when they are agents in constructing their own knowledge, both individually and in collaboration with others (Vygotsky, 1978). This belief is foundational to my interest in the peer feedback process and aligns with a social-constructivist understanding of learning that emphasizes shared meaning-making through interaction. To encourage PCR, feedback systems must be intentionally designed to align with intended learning outcomes and assessment criteria (Biggs, 2012). When learning outcomes related to professional behaviour and feedback literacy are clearly connected to assessment, students are more likely to see value in the process (Ladyshewsky, 2012).

As an avid player of both digital and analogue games, my experience in gaming also influences my interest in this topic. In the world of gaming, especially in multiplayer games, communication and teamwork are paramount for success. Similarly, in the area of PCR, effective communication and collaboration are essential for producing high-quality code. The problem-solving and critical thinking skills honed through gaming also translate to the world of programming and code review. The analytical mindset and attention to detail required in gaming parallel the skills needed for thorough code review (Schmitz, Czauderna, & Klemke, 2011). Understanding how to motivate students in the context of PCR aligns with the principles of game design, where the goal is to create meaningful play through game mechanics that link player action to future outcomes (Salen & Zimmerman, 2003). I believe my experience playing games and teaching CS provides me with a unique perspective on the dynamics of PCR and drives me to delve deeper into this topic.

This study examines students' perceived motivation to give quality PCR feedback through the lens of SDT, focusing on how a meaningful GBL intervention might transform PCR into a more intrinsically motivating and valuable learning experience. The conceptual framework guiding this research is presented in [Figure 1], which illustrates how GBL and PCR contribute to the fulfillment of psychological needs, competence, autonomy, and relatedness, and how these, in turn, affect intrinsic motivation and improved feedback quality.

Figure 1

Conceptual Framework

flowchart TD
    %% Inputs
    GBL("`**Game-Based Learning**
    _Approach_`")
    PCR("`**Peer Code Review**
    _Activity_`")

    MP[\"`**Meaningful Play**
    _Design_`"/]

    A("Autonomy")
    C("Competence")
    R("Relatedness")

    %% Outcomes
    MOTIVATION[/"`Intrinsic Motivation`"\]
    FEEDBACK("`Improved Feedback Quality`")

     Define Actors
    actor Student
    actor Instructor
    participant Moodle
    participant Game

    rect rgb(180, 190, 254) # Light blue for Pre-Intervention (Async)
    Note over Student, Game: Pre-Intervention (Asynchronous, Week 9)
    Student->>Moodle: Submits peer feedback
    Instructor->>Moodle: Scrapes & scores feedback
    Moodle->>Instructor: Provides feedback quality scores
    Instructor->>Game: Assigns yellow action cards based on feedback quality
    end

    rect rgb(166, 227, 161) # Light green for Class Session 1 (Sync)
    Note over Student, Game: Intervention (Synchronous, Week 10)
    Instructor->>Student: Informed consent & Pre-test survey
    Instructor->>Student: Explains game rules & hands out cards
    Student->>Game: Plays first game session
    Instructor->>Student: Reveals feedback-based card distribution
    end

    rect rgb(249, 226, 175) # Light yellow for Post-Intervention (Async)
    Note over Student, Game: Intervention (Asynchronous, Week 11)
    Student->>Moodle: Submits second peer feedback
    Instructor->>Moodle: Scores updated feedback
    Moodle->>Instructor: Provides updated scores
    Instructor->>Game: Assigns new yellow action cards
    end

    rect rgb(243, 139, 168) # Light red for Class Session 2 (Sync)
    Note over Student, Game: Post-Intervention (Synchronous, Week 12)
    Instructor->>Student: Distributes updated game cards
    Student->>Game: Plays second game session
    Instructor->>Student: Post-test survey
    end

Note. This diagram visualizes the chronological sequence of events in the study across four key phases. Time progresses from top to bottom. The entities on the top and bottom represent the roles or systems involved: Student, Instructor, Moodle (an online learning management system), and Game (the card-based peer feedback intervention). The arrows represent direct actions (e.g., submitting feedback or handing out game cards). Each coloured section represents a week in the semester, distinguishing asynchronous phases (done outside of class time) and synchronous phases (conducted during scheduled class time).

Pre-Intervention Phase

Prior to this study, students had been engaging in traditional peer feedback activities since Week 4 of the semester, using the PCR Rubric (Appendix A) as a reference for evaluating their peers' work. This rubric provided a structured framework that guided their feedback, ensuring consistency and clarity in their evaluations. These prior experiences with peer review helped establish a baseline understanding of feedback expectations before the intervention was introduced.

Prior to the intervention, students participated in asynchronous peer feedback through the Moodle Learning Management System's (LMS) Workshop activity (Moodle, 2024). Each student provided feedback to three peers, and this feedback was extracted using a custom scraper (Appendix E) developed by the author. The extracted feedback was anonymized and analyzed using a Large Language Model (LLM) (OpenAI, 2024), which categorized comments based on a Code Review Taxonomy (Appendix B). The taxonomy classifies feedback into distinct categories based on specificity and constructiveness, such as "SA" (Specific Actionable), "G+" (General Positive), or "G0" (General Neutral).

To guide the LLM's classification, a few-shot approach (Appendix G) was used, in which the model was provided with a small number of labeled examples to infer how to apply the taxonomy to new comments. This strategy allows LLMs to generalize effectively without extensive training data (Anglin & Ventura, 2024). To verify in the LLM's classification, a subset of outputs was manually reviewed by the author. During this process, the prompting strategy and card-distribution scripts (Appendix E) were refined iteratively to improve classification consistency. While this verification process was informal and not independently validated, the reviewed samples showed a high level of agreement with the intended taxonomy categories, suggesting the LLM output was sufficiently reliable for the purposes of this exploratory study.

To quantify the quality of feedback for analysis, each taxonomy category was assigned a numerical score using a predefined conversion system (Table 1). These scores were then used to determine the number of cards received at the start of the game, introducing a performance-based starting condition for the intervention.

Table 1

Numerical Conversion of Feedback Quality Scores

Code	Description	Score
SA	Specific Actionable	5
S+/S-	Specific Positive/Negative	4
S0	Specific Neutral	3
G+/G-/GA	General Positive/Negative/Advice	2
G0/PV	General Neutral/Placeholder Value	1
OT	Off-topic/Irrelevant	0

Each student provided feedback to three peers, and the median of these three numerical scores was used as their individual feedback quality score in statistical analysis. The median was chosen to reduce the influence of outliers or inconsistencies in individual comments, providing a more robust measure of typical feedback quality for each student.

To ensure that the game could be reasonably completed within a class session, a simulation was developed (Appendix E) to play 1,000 rounds of the game under varying conditions. The results indicated that the average game lasted 13 turns, with the longest game reaching 24 turns. In terms of duration, the simulation estimated an average game time of 19 minutes, with the longest recorded game taking 35 minutes. These findings informed the game design parameters, such as the number of starting resources and the inclusion of time-limiting mechanics to maintain feasibility within the allotted class period.

Intervention Phase

During a synchronous class session, students first completed the informed consent form (Appendix C), followed by a pre-test (Appendix D) that measured their perceived autonomy, competence, and relatedness in relation to peer feedback, along with baseline questions about their gaming habits and attitudes. They were then placed into groups of four and received physical card decks for gameplay. The instructor displayed a table assigning yellow action cards to each student, prompting their curiosity about the distribution.

Students played the card game (Appendix F) under standard conditions, engaging with mechanics centred on resource collection, strategic decision-making, and competition. Although peer feedback was not a direct action within the game, it was embedded in the game structure: students' starting resources (yellow action cards) were determined by the quality of their feedback in the previous peer review activity. Each student's feedback was analyzed and scored using a code review taxonomy (Appendix B), and their score was used to assign an initial advantage in the game. Since the course was Game Programming, the game's entities (e.g., State Machine, Timer, Collision, Sprite) were drawn from foundational development concepts covered in class, enhancing topical relevance and familiarity. This feedback-performance link was revealed after the first game session during a debriefing, when students were shown how their starting cards were derived from their peer feedback scores. This design choice created a delayed but meaningful incentive for quality feedback, connecting academic effort to in-game success.

Post-Intervention Phase

Following the first game session, students completed another asynchronous peer feedback activity through the Moodle LMS, knowing that their feedback quality would impact their performance advantages in a future game session. The second iteration of the game followed the same structure as the first, with students receiving yellow action cards based on their new feedback quality scores. After playing the game for the second time, students completed the post-test survey (Appendix D), measuring changes in their perceptions of competence, autonomy, and relatedness in relation to peer feedback, along with two open-ended questions to solicit suggestions about improved game mechanics and any comments about the game influencing their motivation.

Instruments

Code Review Taxonomy (RQ1)

The Code Review Taxonomy (Appendix B) was used to operationalize the concept of feedback quality for RQ1, which asked whether the GBL intervention improved the quality of PCR. This taxonomy categorized feedback comments into distinct types (Hamer, Purchase, Luxton-Reilly, & Denny, 2015; Indriasari, Denny, Lottridge, & Luxton-Reilly, 2023). Feedback was classified as either positive or negative, depending on whether it reinforced correct code implementation or identified issues. Additionally, comments were categorized based on whether they provided actionable advice or suggestions for improvement. The taxonomy also distinguished between general feedback (addressing broader coding concepts) and code-specific feedback (focusing on particular lines of code or implementation details). These categories provided a structured framework for analyzing feedback quality.

While no formal psychometric validation (e.g., inter-rater reliability or construct validity) is reported for this taxonomy, it has been used in multiple studies in computing education to analyze the quality of peer code review comments. Indriasari et al. (2023) adopted the taxonomy from Hamer et al. (2015), noting that it aligns with characteristics of effective written feedback outlined in broader feedback literature, such as specificity, constructive suggestions, and reinforcement of strengths (Gehringer, 2017; Voelkel, Varga-Atkins, & Mello, 2020). This alignment with pedagogical goals supports its use as a practical framework for categorizing feedback in this context.

Intrinsic Motivation Inventory (RQ2)

The Intrinsic Motivation Inventory (IMI) was used to address RQ2, which focused on whether the intervention influenced students' motivation as conceptualized by SDT. The IMI is a validated Likert-style survey that assesses SDT sub-scales for competence, autonomy, relatedness (Ryan, Mims, & Koestner, 1983). It utilizes a 5-point scale (1 = not at all true to 5 = very true). Survey questions were adapted to reflect the PCR experience with the full list of pre-test and post-test questions included in Appendix D. For example, competence-related questions asked whether students thought their feedback was useful to others. Autonomy-related questions asked students whether they felt they had choices in how they provided peer feedback or whether they had input in deciding how to evaluate their peers' work. Relatedness was assessed through questions that explored whether students felt connected to their peers during the peer review process and whether they felt comfortable giving feedback.

The IMI has demonstrated strong validity and internal consistency across multiple domains (McAuley, Duncan, & Tammen, 1989). The SDT research community recognizes that minor wording adjustments and even shorter versions can be used without compromising reliability (Self-Determination Theory, n.d.). This flexibility makes the IMI particularly well-suited to educational contexts like this one, where survey fatigue and contextual relevance are concerns.

Data Analysis

Data analysis was organized around the two research questions, each targeting a distinct dependent variable. The independent variable was the implementation of the game-based learning intervention, specifically, the peer feedback card game played by the students in Weeks 10 and 12.

To address RQ1, which asked whether the intervention improved the quality of peer feedback, the dependent variable was students' feedback quality scores. Each student provided feedback to three peers in both the pre- and post-intervention phases. To account for variability across different peer reviews, the median feedback quality score from each student's three evaluations was used for the analysis. Because these scores were ordinal, the Wilcoxon Signed-Rank Test was used to assess pre-post differences.

To address RQ2, which investigated whether the intervention influenced students' perceived competence, autonomy, and relatedness, the dependent variables were the sub-scale scores from the adapted IMI. Independent t-tests were conducted on the mean scores for each sub-scale, as the pre- and post-tests were completed anonymously and thus could not be paired. Perceived autonomy was measured using items Q5, Q6, Q8, and Q9; however, Q6 was excluded from analysis due to ambiguous wording, and Q9 was reverse-scored. Perceived competence was measured using Q2, Q3, and Q4, while relatedness was measured using Q1 and Q7.

A significance level of $α = .05$ was used for all inferential statistical analyses. Descriptive statistics, including mean and median scores, were also calculated for each variable to illustrate overall trends in student motivation and feedback quality over time.

In addition to quantitative data, students' open-ended responses from the post-test survey were analyzed using thematic coding. Responses were reviewed inductively to identify emergent themes related to students' motivation, perceptions of the game's mechanics, and suggestions for its improvement. This qualitative data supported interpretation of the quantitative results and helped contextualize student experiences during the intervention.

Ethical Considerations

This study received ethical approval from both the Université de Sherbrooke (Appendix H) on April 16, 2024 and John Abbott College (Appendix I) on May 14, 2024. The researcher also completed the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS 2: CORE 2022) training (Appendix J) on March 23, 2024, which certifies adherence to Canadian standards for research ethics. Data collection took place during Weeks 9 to 12 of the Fall 2024 semester. The results were analyzed during the Winter 2025 term, and the thesis was written in parallel to complete the requirements for submission to the Université de Sherbrooke by Spring 2025. This schedule ensured that all research activities were conducted within the approved ethical review period.

The researcher's dual role as both instructor and investigator raised potential concerns regarding coercion and power dynamics. To mitigate this, explicit informed consent was obtained (Appendix C), and students were informed that participation was entirely voluntary and would not affect their grades. They had the option to withdraw at any time without penalty. Students were given a clear explanation of the study's purpose, procedures, potential risks and benefits, as well as methods of data collection and use, thereby ensuring informed decision-making.

Anonymity and confidentiality were maintained throughout the process. Pre- and post-test survey responses were collected anonymously to protect students' motivational data. Feedback quality data, however, was linked to individual students to enable the intervention's game mechanic; in these cases, only the researcher had access to identifiable data. Before analysis, all peer feedback was anonymized and scrubbed of identifying details. All data were stored securely on Canadian servers via the Moodle LMS and Microsoft Forms.

The dissemination of findings poses minimal risk to participants. All results are reported in aggregate or anonymized form to ensure that individual students cannot be identified. No quotations or specific feedback samples are attributed to individual students. Findings will be shared through academic presentations, conferences, journals, and the final thesis submission, with no foreseeable negative impact on participants.

CHAPTER 5: CLOSING STATEMENT

Appendix A: PCR RUBRIC

Replace with clean code rubric

Appendix B: CODE REVIEW TAXONOMY

Code Review Taxonomy

S+: Comments in this category provided positive feedback about a speciﬁc element of the code.
S−: Comments in this category provided speciﬁc negative feedback about the functionality, style or correctness of the program.
S0: Comments in this category were speciﬁc, but were not obviously positive or negative in tone.
SA: Comments in this category provided speciﬁc advice to a student about how to improve their code.
G+: Comments in this category are general comments that are positive. The comments do not relate to a speciﬁc element of style or requirement speciﬁed in the assignment.
G−: Comments in this category are general negative comments. They do not refer to any speciﬁc elements of code, but are instead comments directed at the overall quality (summary comments).
G0: Comments in this category are general comments that do not have either positive or negative connotations.
GA: Comments in this category provided general advice to peers, but did not refer to speciﬁcs within the code.
OT: Comments in this category were off-topic.

Purpose

This project is being conducted by Vikram Singh, a Computer Science Teacher at John Abbott College, for the completion of a Master's Degree in College Teaching, accredited by the Université de Sherbrooke. This study explores how different approaches to peer code review affect student motivation and feedback quality.

Procedure

Before beginning the activity, you'll complete a short survey regarding your perspectives on peer feedback.
You will participate in a brief card game related to your course material.
After the card game, you'll complete a survey about your experience.
Your feedback on the course assignments, along with survey responses, will be analyzed to better understand factors influencing feedback quality and student motivation in peer code review.

Potential Risks & Benefits

There are no known risks for participation in this study.

By investigating what makes peer code review motivating (or not), your participation could lead to the design of interventions that make peer code review a more engaging and beneficial process for everyone. Your participation could help students develop stronger feedback skills, crucial for both their success in CS courses and future careers. The findings could provide valuable information to your instructor and others about how to refine peer code review practices, potentially leading to widespread changes that enhance the learning experience for many CS students. For your interest, the results of the study will be sent to you after the study has been completed, if so desired.

Confidentiality

Your participation in this study is confidential in the following ways:

Your name will not appear in the research results.
The researcher/teacher will never know if you agree or do not agree to participate in this study, therefore the choice to participate or not has no impact on your final grade, nor on any future interaction with your teacher.
The survey results will be anonymous and kept for five years in Microsoft OneDrive behind two-factor authentication.
The Microsoft Forms questionnaire will be completed anonymously and your personal information will not be revealed. The servers for Microsoft Forms and OneDrive are stored in Canada and therefore your data is protected by Canadian laws.

Your participation in this research is completely voluntary. You have the right to not consent or withdraw consent at any time. If you have any questions about the content or methods of this study, please feel free to contact the teacher/researcher, Vikram Singh, at [email protected] or the supervisor, Paul Darvasi, at [email protected].

If you have any questions about your rights or treatment during this study, please contact the Research and Innovation Officer at JAC, Teresa Hackett, at [email protected].

I attest that I have read the above information and freely consent to participate in the study on peer code review within the context of my 420-5P6 Game Programming course during the Fall 2024 semester at John Abbott College. I understand that my peer feedback data from the course assignments, which may include identifiable information, will be used to facilitate the card game activity and subsequent analysis. I also acknowledge that while this data may be referenced during the activity, my name or any other personal identifiers will not appear in the final research report.

Student Name
Student ID
Date

I wish to receive the results of the study. My email is:

Appendix D: INTRINSIC MOTIVATION INVENTORY

Scale Description

The Intrinsic Motivation Inventory (IMI) is a multidimensional measurement device intended to assess participants' subjective experience related to a target activity in laboratory experiments. It has been used in several experiments related to intrinsic motivation and self-regulation (e.g., Ryan, 1982; Ryan, Mims & Koestner, 1983; Plant & Ryan, 1985; Ryan, Connell, & Plant, 1990; Ryan, Koestner & Deci, 1991; Deci, Eghrari, Patrick, & Leone, 1994). The instrument assesses participants' interest/enjoyment, perceived competence, effort, value/usefulness, felt pressure and tension, and perceived choice while performing a given activity, thus yielding six subscale scores. Recently, a seventh subscale has been added to tap the experiences of relatedness, although the validity of this subscale has yet to be established. The interest/enjoyment subscale is considered the self-report measure of intrinsic motivation; thus, although the overall questionnaire is called the Intrinsic Motivation Inventory, it is only the one subscale that assesses intrinsic motivation, per se. As a result, the interest/enjoyment subscale often has more items on it that do the other subscales. The perceived choice and perceived competence concepts are theorized to be positive predictors of both self-report and behavioral measures of intrinsic motivation, and pressure/tension is theorized to be a negative predictor of intrinsic motivation. Effort is a separate variable that is relevant to some motivation questions, so is used it its relevant. The value/usefulness subscale is used in internalization studies (e.g., Deci et al, 1994), the idea being that people internalize and become self-regulating with respect to activities that they experience as useful or valuable for themselves. Finally, the relatedness subscale is used in studies having to do with interpersonal interactions, friendship formation, and so on.

The IMI consists of varied numbers of items from these subscales, all of which have been shown to be factor analytically coherent and stable across a variety of tasks, conditions, and settings. The general criteria for inclusion of items on subscales have been a factor loading of at least 0.6 on the appropriate subscale, and no cross loadings above 0.4. Typically, loadings substantially exceed these criteria. Nonetheless, we recommend that investigators perform their own factor analyses on new data sets. Past research suggests that order effects of item presentation appear to be negligible, and the inclusion or exclusion of specific subscales appears to have no impact on the others. Thus, it is rare that all items have been used in a particular experiment. Instead, experimenters have chosen the subscales that are relevant to the issues they are exploring.

The IMI items have often been modified slightly to fit specific activities. Thus, for example, an item such as "I tried very hard to do well at this activity" can be changed to "I tried very hard to do well on these puzzles" or "...in learning this material" without effecting its reliability or validity. As one can readily tell, there is nothing subtle about these items; they are quite face-valid. However, in part, because of their straightforward nature, caution is needed in interpretation. We have found, for example, that correlations between self-reports of effort or interest and behavioral indices of these dimensions are quite modest--often around 0.4. Like other self-report measures, there is always the need to appropriately interpret how and why participants report as they do. Egoinvolvements, self-presentation styles, reactance, and other psychological dynamics must be considered. For example, in a study by Ryan, Koestner, and Deci (1991), we found that when participants were ego involved, the engaged in pressured persistence during a free choice period and this behavior did not correlate with the self-reports of interest/enjoyment. In fact, we concluded that to be confident in one's assessment of intrinsic motivation, one needs to find that the free-choice behavior and the self-reports of interest/enjoyment are significantly correlated.

Another issue is that of redundancy. Items within the subscales overlap considerably, although randomizing their presentation makes this less salient to most participants. Nonetheless, shorter versions have been used and been found to be quite reliable. The incremental R for every item above 4 for any given factor is quite small.

Still, it is very important to recognize that multiple item subscales consistently outperform single items for obvious reasons, and they have better external validity.

On The Scale page, there are five sections. First, the full 45 items that make up the 7 subscales are shown, along with information on constructing your own IMI and scoring it. Then, there are four specific versions of the IMI that have been used in past studies. This should give you a sense of the different ways it has been used. These have different numbers of items and different numbers of subscales, and they concern different activities. First, there is a standard, 22-item version that has been used in several studies, with four subscales: interest/ enjoyment, perceived competence, perceived choice, and pressure/tension. Second, there is a short 9-item version concerned with the activity of reading some text material; it has three subscales: interest/enjoyment, perceived competence, and pressure/tension. Then, there is the 25-item version that was used in the internalization study, including the three subscales of value/usefulness, interest/enjoyment, and perceived choice. Finally, there is a 29-item version of the interpersonal relatedness questionnaire that has five subscales: relatedness, interest/enjoyment, perceived choice, pressure/tension, and effort.

Finally, McAuley, Duncan, and Tammen (1987) did a study to examine the validity of the IMI and found strong support for its validity.

References

Deci, E. L., Eghrari, H., Patrick, B. C., & Leone, D. (1994). Facilitating internalization: The selfdetermination theory perspective. Journal of Personality, 62, 119-142.

McAuley, E., Duncan, T., & Tammen, V. V. (1987). Psychometric properties of the Intrinsic Motivation Inventory in a competitive sport setting: A confirmatory factor analysis. Research Quarterly for Exercise and Sport, 60, 48-58.

Plant, R. W., & Ryan, R. M. (1985). Intrinsic motivation and the effects of self-consciousness, selfawareness, and ego-involvement: An investigation of internally-controlling styles. Journal of Personality, 53, 435-449.

Ryan, R. M. (1982). Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psychology, 43, 450-461.

Ryan, R. M., Connell, J. P., & Plant, R. W. (1990). Emotions in non-directed text learning. Learning and Individual Differences, 2, 1-17.

Ryan, R. M., Koestner, R., & Deci, E. L. (1991). Varied forms of persistence: When free-choice behavior is not intrinsically motivated. Motivation and Emotion, 15, 185-205.

Ryan, R. M., Mims, V., & Koestner, R. (1983). Relation of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736-750.

The Scales

THE POST-EXPERIMENTAL INTRINSIC MOTIVATION INVENTORY

(Below are listed all 45 items that can be used depending on which are needed.)

For each of the following statements, please indicate how true it is for you, using the following scale:

         1    2    3    4    5    6    7
not at all true | somewhat true | very true

Interest/Enjoyment

I enjoyed doing this activity very much.
This activity was fun to do.
I thought this was a boring activity. (R)
This activity did not hold my attention at all. (R)
I would describe this activity as very interesting. I thought this activity was quite enjoyable.
While I was doing this activity, I was thinking about how much I enjoyed it.

Perceived Competence

I think I am pretty good at this activity.
I think I did pretty well at this activity, compared to other students. After working at this activity for awhile, I felt pretty competent.
I am satisfied with my performance at this task. I was pretty skilled at this activity.
This was an activity that I couldn't do very well. (R)

Effort/Importance

I put a lot of effort into this.
I didn't try very hard to do well at this activity. (R)
I tried very hard on this activity.
It was important to me to do well at this task. I didn't put much energy into this. (R)

Pressure/Tension

I did not feel nervous at all while doing this. (R)
I felt very tense while doing this activity.
I was very relaxed in doing these. (R)
I was anxious while working on this task.
I felt pressured while doing these.

Perceived Choice

I believe I had some choice about doing this activity.
I felt like it was not my own choice to do this task. (R)
I didn't really have a choice about doing this task. (R)
I felt like I had to do this. (R)
I did this activity because I had no choice. (R)
I did this activity because I wanted to.
I did this activity because I had to. (R)

Value/Usefulness

I believe this activity could be of some value to me.
I think that doing this activity is useful for blank.
I think this is important to do because it can blank.
I would be willing to do this again because it has some value to me.
I think doing this activity could help me to blank.
I believe doing this activity could be beneficial to me.
I think this is an important activity.

Relatedness

I felt really distant to this person. (R)
I really doubt that this person and I would ever be friends. (R)
I felt like I could really trust this person.
I'd like a chance to interact with this person more often.
I'd really prefer not to interact with this person in the future. (R)
I don't feel like I could really trust person. (R)
It is likely that this person and I could become friends if we interacted a lot.
I feel close to this person.

Constructing the IMI for your study. First, decide which of the variables (factors) you want to use, based on what theoretical questions you are addressing. Then, use the items from those factors, randomly ordered. If you use the value/usefulness items, you will need to complete the three items as appropriate. In other words, if you were studying whether the person believes an activity is useful for improving concentration, or becoming a better basketball player, or whatever, then fill in the blanks with that information. If you do not want to refer to a particular outcome, then just truncate the items with its being useful, helpful, or important.

Scoring information for the IMI. To score this instrument, you must first reverse score the items for which an (R) is shown after them. To do that, subtract the item response from 8, and use the resulting number as the item score. Then, calculate subscale scores by averaging across all of the items on that subscale. The subscale scores are then used in the analyses of relevant questions.

The following is a 22 item version of the scale that has been used in some lab studies on intrinsic motivation. It has four subscales: interest/enjoyment, perceived choice, perceived competence, and pressure/tension. The interest/enjoyment subscale is considered the self-report measure of intrinsic motivation; perceived choice and perceived competence are theorized to be positive predictors of both self-report and behavioral measures of intrinsic motivation. Pressure tension is theorized to be a negative predictor of intrinsic motivation. Scoring information is presented after the questionnaire itself.

TASK EVALUATION QUESTIONNAIRE

For each of the following statements, please indicate how true it is for you, using the following scale:

         1    2    3    4    5    6    7
not at all true | somewhat true | very true

While I was working on the task I was thinking about how much I enjoyed it.
I did not feel at all nervous about doing the task.
I felt that it was my choice to do the task.
I think I am pretty good at this task.
I found the task very interesting.
I felt tense while doing the task.
I think I did pretty well at this activity, compared to other students.
Doing the task was fun.
I felt relaxed while doing the task.
I enjoyed doing the task very much.
I didn't really have a choice about doing the task.
I am satisfied with my performance at this task.
I was anxious while doing the task.
I thought the task was very boring.
I felt like I was doing what I wanted to do while I was working on the task.
I felt pretty skilled at this task.
I thought the task was very interesting.
I felt pressured while doing the task.
I felt like I had to do the task.
I would describe the task as very enjoyable.
I did the task because I had no choice.
After working at this task for awhile, I felt pretty competent.

Scoring information. Begin by reverse scoring items # 2, 9, 11, 14, 19, 21. In other words, subtract the item response from 8, and use the result as the item score for that item. This way, a higher score will indicate more of the concept described in the subscale name. Thus, a higher score on pressure/tension means the person felt more pressured and tense; a higher score on perceived competence means the person felt more competent; and so on. Then calculate subscale scores by averaging the items scores for the items on each subscale. They are as follows. The (R) after an item number is just a reminder that the item score is the reverse of the participant's response on that item.

Interest/enjoyment: 1, 5, 8, 10, 14 (R), 17, 20
Perceived competence: 4, 7, 12, 16, 22
Perceived choice: 3, 11 (R), 15, 19 (R), 21 (R)
Pressure/tension: 2 (R), 6, 9 (R), 13, 18

The subscale scores can then be used as dependent variables, predictors, or mediators, depending on the research questions being addressed.

TEXT MATERIAL QUESTIONNAIRE I

For each of the following statements, please indicate how true it is for your, using the following scale as a guide:

         1    2    3    4    5    6    7
not at all true | somewhat true | very true

While I was reading this material, I was thinking about how much I enjoyed it.
I did not feel at all nervous while reading.
This material did not hold my attention at all.
I think I understood this material pretty well.
I would describe this material as very interesting.
I think I understood this material very well, compared to other students.
I enjoyed reading this material very much.
I felt very tense while reading this material.
This material was fun to read.

Scoring information. Begin by reverse scoring items # 2 and 3. In other words, subtract the item response from 8, and use the result as the item score for that item. This way, a higher score will indicate more of the

concept described in the subscale name. Then calculate subscale scores by averaging the items scores for the items on each subscale. They are shown below. The (R) after an item number is just a reminder that the item score is the reverse of the participant's response on that item.

Interest/enjoyment: 1, 3 (R), 5, 7, 9
Perceived competence: 4, 6
Pressure/tension: 2 (R), 8

The next version of the questionnaire was used for a study of internalization with an uninteresting computer task (Deci et al., 1994).

ACTIVITY PERCEPTION QUESTIONNAIRE

The following items concern your experience with the task. Please answer all items. For each item, please indicate how true the statement is for you, using the following scale as a guide:

         1    2    3    4    5    6    7
not at all true | somewhat true | very true

I believe that doing this activity could be of some value for me.
I believe I had some choice about doing this activity.
While I was doing this activity, I was thinking about how much I enjoyed it.
I believe that doing this activity is useful for improved concentration.
This activity was fun to do.
I think this activity is important for my improvement.
I enjoyed doing this activity very much.
I really did not have a choice about doing this activity.
I did this activity because I wanted to.
I think this is an important activity.
I felt like I was enjoying the activity while I was doing it.
I thought this was a very boring activity.
It is possible that this activity could improve my studying habits.
I felt like I had no choice but to do this activity.
I thought this was a very interesting activity.
I am willing to do this activity again because I think it is somewhat useful.
I would describe this activity as very enjoyable.
I felt like I had to do this activity.
I believe doing this activity could be somewhat beneficial for me.
I did this activity because I had to.
I believe doing this activity could help me do better in school.
While doing this activity I felt like I had a choice.
I would describe this activity as very fun.
I felt like it was not my own choice to do this activity.
I would be willing to do this activity again because it has some value for me.

Scoring information. Begin by reverse scoring items # 8, 12, 14, 18, 20, and 24 by subtracting the item response from 8 and using the result as the item score for that item. Then calculate subscale scores by averaging the items scores for the items on each subscale. They are shown below. The (R) after an item number is just a reminder that the item score is the reverse of the participant's response on that item.

Interest/enjoyment: 3, 5, 7, 11, 12 (R), 15, 17, 23
Value/usefulness: 1, 4, 6, 10, 13, 16, 19, 21, 25
Perceived choice: 2, 8 (R), 9, 14 (R), 18 (R), 20 (R), 22, 24 (R)

SUBJECT IMPRESSIONS QUESTIONNAIRE

The following sentences describe thoughts and feelings you may have had regarding the other person who participated in the experiment with you. For each of the following statement please indicate how true it is for you, using the following scale as a guide:

         1    2    3    4    5    6    7
not at all true | somewhat true | very true

While I was interacting with this person, I was thinking about how much I enjoyed it.
I felt really distant to this person.
I did not feel at all nervous about interacting with this person.
I felt like I had choice about interacting with this person.
I would describe interacting with this person as very enjoyable.
I really doubt that this person and I would ever become friends.
I found this person very interesting.
I enjoyed interacting with this person very much.
I felt tense while interacting with this person.
I really feel like I could trust this person.
Interacting with this person was fun.
I felt relaxed while interacting with this person.
I'd like a chance to interact more with this person.
I didn't really have a choice about interacting with this person.
I tried hard to have a good interaction with this person.
I'd really prefer not to interact with this person in the future.
I was anxious while interacting with this person.
I thought this person was very boring.
I felt like I was doing what I wanted to do while I was interacting with this person.
I tried very hard while interacting with this person.
I don't feel like I could really trust this person.
I thought interacting with this person was very interesting.
I felt pressured while interacting with this person.
I think it's likely that this person and I could become friends.
I felt like I had to interact with this person.
I feel really close to this person.
I didn't put much energy into interacting with this person.
I interacted with this person because I had no choice.
I put some effort into interacting with this person.

Scoring information. Begin by reverse scoring items # 2, 3, 6, 12, 14, 16, 18, 21, 25, 27, and 28 by subtracting the item response from 8 and using the result as the item score for that item. Then calculate subscale scores by averaging the items scores for the items on each subscale. They are shown below. The (R) after an item number is just a reminder that the item score is the reverse of the participant's response on that item.

Relatedness: 2 (R), 6 (R), 10, 13, 16 (R), 21 (R), 24, 26
Interest/enjoyment: 1, 5, 7, 8, 11, 18 (R), 22
Perceived choice: 4, 14 (R), 19, 25 (R), 28 (R)
Pressure/tension: 3 (R), 9, 12 (R), 17, 23
Effort: 15, 20, 27 (R), 29

References

Anglin, K. L., & Ventura, C. (2024). Automatic text classification with large language models: A review of openai for zero- and few-shot classification. Journal of Educational and Behavioral Statistics, 10769986241279927. https://doi.org/10.3102/10769986241279927

Bandura, A. (2012). On the functional properties of perceived self-efficacy revisited. Journal of Management, 38(1), 9–44. https://doi.org/10.1177/0149206311410606

Biggs, J. (2012). What the student does: teaching for enhanced learning. Higher Education Research & Development, 31(1), 39–55. https://doi.org/10.1080/07294360.2012.642839

Dagres, S. (2017, April). Quebec’s R-score: A GL.TCHy measure of academic achievement?

Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Boston, MA: Springer US. https://doi.org/10.1007/978-1-4899-2271-7

Deci, E. L., & Ryan, R. M. (1994). Promoting self-determined education. Scandinavian Journal of Educational Research, 38(1), 3–14. https://doi.org/10.1080/0031383940380101

Falchikov, N. (2013). Practical peer assessment and feedback: Problems and solutions. In Improving assessment through student involvement: Practical solutions for aiding learning in higher and further education. Hoboken: Taylor and Francis.

Gehringer, E. (2017). Helping students to provide effective peer feedback. 2017 ASEE Annual Conference & Exposition Proceedings, 28434. Columbus, Ohio: ASEE Conferences. https://doi.org/10.18260/1-2--28434

Hamer, J., Purchase, H., Luxton-Reilly, A., & Denny, P. (2015). A comparison of peer and tutor feedback. Assessment & Evaluation in Higher Education, 40(1), 151–164. https://doi.org/10.1080/02602938.2014.893418

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

Indriasari, T. D., Denny, P., Lottridge, D., & Luxton-Reilly, A. (2023). Gamification improves the quality of student peer code review. Computer Science Education, 33(3), 458–482. https://doi.org/10.1080/08993408.2022.2124094

Indriasari, T. D., Luxton-Reilly, A., & Denny, P. (2020). A review of peer code review in higher education. ACM Transactions on Computing Education, 20(3), 1–25. https://doi.org/10.1145/3403935

Indriasari, T. D., Luxton-Reilly, A., & Denny, P. (2021). Investigating accuracy and perceived value of feedback in peer code review using gamification. Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, 199–205. Virtual Event Germany: ACM. https://doi.org/10.1145/3430665.3456338

Ladyshewsky, R. K. (2012). The role of peers in feedback processes. In D. Boud & E. Molloy (Eds.), Feedback in Higher and Professional Education: Understanding it and doing it well. Routledge. https://doi.org/10.4324/9780203074336

Li, X. (2006). Using peer review to assess coding standards: A case study. San Diego.

McAuley, E., Duncan, T., & Tammen, V. V. (1989). Psychometric properties of the intrinsic motivation inventory in a competitive sport setting: A confirmatory factor analysis. Research Quarterly for Exercise and Sport, 60(1), 48–58. https://doi.org/10.1080/02701367.1989.10607413

Moodle. (2024). Moodle learning management system [Moodle Pty Ltd.].

OpenAI. (2024, October). ChatGPT [OpenAI].

Papastergiou, M. (2009). Digital game-based learning in high school computer science education: Impact on educational effectiveness and student motivation. Computers & Education, 52(1), 1–12. https://doi.org/10.1016/j.compedu.2008.06.004

Perez-Quinones, M. A., & Turner, S. (2009). Exploring peer review in the computer science classroom. https://doi.org/10.48550/ARXIV.0907.3456

Petersen, A., & Zingaro, D. (2018). Code reviews in large, first-year courses. Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, 354–355. Larnaca Cyprus: ACM. https://doi.org/10.1145/3197091.3205832

Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95(4), 667–686. https://doi.org/10.1037/0022-0663.95.4.667

Powell, K. C., & Kalina, C. J. (2009). Cognitive and social constructivism: Developing tools for an effective classroom. Education, 130(2), 241–250.

Proulx, J.-N., Romero, M., & Arnab, S. (2017). Learning mechanics and game mechanics under the perspective of self-determination theory to foster motivation in digital game based learning. Simulation & Gaming, 48(1), 81–97. https://doi.org/10.1177/1046878116674399

Race, P. (2001). A briefing on self, peer and group assessment. York: Learning and Teaching Support Network.

Ramsden, P. (2003). Approaches to Learning. In Learning to Teach in Higher Education (2nd ed., pp. 39–61). Routledge.

Sadowski, C., Söderberg, E., Church, L., Sipko, M., & Bacchelli, A. (2018). Modern code review: A case study at google. Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, 181–190. Gothenburg Sweden: ACM. https://doi.org/10.1145/3183519.3183525

Salen, K., & Zimmerman, E. (2003). Rules of play: Game design fundamentals. Cambridge, Mass: MIT Press.

Schmitz, B., Czauderna, A., & Klemke, R. (2011). Game based learning for computer science education.

Self-Determination Theory. (n.d.). Intrinsic motivation inventory [Https://selfdeterminationtheory.org/intrinsic-motivation-inventory/].

Uysal, A., & Yildirim, I. G. (2016). Self-determination theory in digital games. In B. Bostan (Ed.), Gamer Psychology and Behavior (1st ed. 2016). Cham: Springer International Publishing : Imprint: Springer. https://doi.org/10.1007/978-3-319-29904-4

Voelkel, S., Varga-Atkins, T., & Mello, L. V. (2020). Students tell us what good written feedback looks like. FEBS Open Bio, 10(5), 692–706. https://doi.org/10.1002/2211-5463.12841

Vygotsky, L. S. (1978). Interaction between learning and development. In M. Cole, V. Jolm-Steiner, S. Scribner, & E. Souberman (Eds.), Mind in Society: Development of Higher Psychological Processes. Harvard University Press. https://doi.org/10.2307/j.ctvjf9vz4

ABSTRACT

LIST OF TABLES

LIST OF FIGURES

LIST OF ACRONYMS

CHAPTER 1: PROBLEM STATEMENT

CHAPTER 2: CONCEPTUAL FRAMEWORK

Pre-Intervention Phase

Intervention Phase

Post-Intervention Phase

Instruments

Code Review Taxonomy (RQ1)

Intrinsic Motivation Inventory (RQ2)

Data Analysis

Ethical Considerations

CHAPTER 5: CLOSING STATEMENT

Appendix A: PCR RUBRIC

Appendix B: CODE REVIEW TAXONOMY

Code Review Taxonomy

Appendix C: CONSENT FORM

Purpose

Procedure

Potential Risks & Benefits

Confidentiality

Statement of Consent

Appendix D: INTRINSIC MOTIVATION INVENTORY

Scale Description

References

The Scales

Interest/Enjoyment

Perceived Competence

Effort/Importance

Pressure/Tension

Perceived Choice

Value/Usefulness

Relatedness

TASK EVALUATION QUESTIONNAIRE

TEXT MATERIAL QUESTIONNAIRE I

ACTIVITY PERCEPTION QUESTIONNAIRE

SUBJECT IMPRESSIONS QUESTIONNAIRE

References