A comparison of peer and tutor feedback
Authors: John Hamer, Helen Purchase, Andrew Luxton-Reilly, Paul Denny
Date: 2015-01-02
We report on a study comparing peer feedback with feedback written by tutors on a large, undergraduate software engineering programming class. Feedback generated by peers is generally held to be of lower quality to feedback from experienced tutors, and this study sought to explore the extent and nature of this difference. We looked at how seriously peers undertook the reviewing task, differences in the level of detail in feedback comments and differences with respect to tone (whether comments were positive, negative or neutral, offered advice or addressed the author personally). Peer feedback was also compared by academic standing, and by gender. We found that, while tutors wrote longer comments than peers and gave more specific feedback, in other important respects (such as offering advice) the differences were not significant.
- Q1: How seriously did the peers take their responsibilities? We consider this with respect to the number of comment boxes completed (Q1a) and the length of the comments (Q1b). Q2: Was there any difference between the peers’ and tutors’ reviewing? We consider this with respect to the general/specific dimension (Q2a), with respect to the positive/negative/neutral/advice/personal/off-topic dimension (Q2b) and with respect to the marks given (Q2c). Q3: Were there any differences with respect to personal characteristics, specifically academic ability (Q3a) and gender (Q3b). (@hamer2015, 154)
- We began by making a random selection of 10% of the project authors. Each student in this sample of 59 was marked by one tutor and peer reviewed by up to four students. We collated and classified all of the comments generated by both the tutor marking and peer reviewing processes for all project authors in our sample. Comments were classified by four coders. The coders initially discussed the coding scheme and classified 10 sample comments together. The comments were then divided in two, and each coder independently classified half the comments. After this initial classification, the coders paired up and compared their results, coming to a consensus decision on any differences. Comments were assigned uniform anonymous identifiers, so the coders were not aware of whether a comment was written by a tutor or a student. The three primary categories for classification were ‘positive’, ‘negative’ and ‘advice/action’. A comment was classified as positive if it highlighted something that was done well, and negative if it highlighted something that was done poorly. Advice/action comments gave suggestions for making modifications to the programme. Moreover, these three primary categories were further divided into specific and general. Specific comments targeted particular elements of the code, whereas general comments were either vague or more high-level. We defined two additional categories: ‘personal voice’ and ‘off-topic’. Comments were classified as personal voice if they were written in the second person (i.e. ‘you’) or included other personal features such as emoticons. Off-topic comments were unrelated to the project. This gave us a total of eight categories, and each comment could be classified with any number of these. (@hamer2015, 155)
- We expected, and found, that tutors identify more points to comment on than peers, and are able to make more specific comments on technical matters such as correctness. The increased frequency of negative comments in reviews by tutors and by high performing students reflects a confidence in the course material. This pattern conforms to our own anecdotal observation that negativity increases with the initial acquisition of expertise, but then subsequently reduces as teaching experience tempers expectations. (@hamer2015, 162)
- We, and others, have argued that effective peer review does not depend on the feedback produced by peers being of the same standard as tutors, as its primary value arises from the process of writing a review. (@hamer2015, 162)
- I had not considered this before! Makes sense though. It's the metacognitive act and the analysis against one's own solution that is the most valuable part of PCR.