A comparison of peer and tutor feedback

Authors: John Hamer, Helen Purchase, Andrew Luxton-Reilly, Paul Denny

Date: 2015-01-02

Abstract

We report on a study comparing peer feedback with feedback written by tutors on a large, undergraduate software engineering programming class. Feedback generated by peers is generally held to be of lower quality to feedback from experienced tutors, and this study sought to explore the extent and nature of this difference. We looked at how seriously peers undertook the reviewing task, differences in the level of detail in feedback comments and differences with respect to tone (whether comments were positive, negative or neutral, offered advice or addressed the author personally). Peer feedback was also compared by academic standing, and by gender. We found that, while tutors wrote longer comments than peers and gave more speciﬁc feedback, in other important respects (such as offering advice) the differences were not signiﬁcant.

Questions

Q1: How seriously did the peers take their responsibilities? We consider this with respect to the number of comment boxes completed (Q1a) and the length of the comments (Q1b). Q2: Was there any difference between the peers’ and tutors’ reviewing? We consider this with respect to the general/speciﬁc dimension (Q2a), with respect to the positive/negative/neutral/advice/personal/off-topic dimension (Q2b) and with respect to the marks given (Q2c). Q3: Were there any differences with respect to personal characteristics, speciﬁcally academic ability (Q3a) and gender (Q3b). (@hamer2015, 154)

Methodology

We began by making a random selection of 10% of the project authors. Each student in this sample of 59 was marked by one tutor and peer reviewed by up to four students. We collated and classiﬁed all of the comments generated by both the tutor marking and peer reviewing processes for all project authors in our sample. Comments were classiﬁed by four coders. The coders initially discussed the coding scheme and classiﬁed 10 sample comments together. The comments were then divided in two, and each coder independently classiﬁed half the comments. After this initial classiﬁcation, the coders paired up and compared their results, coming to a consensus decision on any differences. Comments were assigned uniform anonymous identiﬁers, so the coders were not aware of whether a comment was written by a tutor or a student. The three primary categories for classiﬁcation were ‘positive’, ‘negative’ and ‘advice/action’. A comment was classiﬁed as positive if it highlighted something that was done well, and negative if it highlighted something that was done poorly. Advice/action comments gave suggestions for making modiﬁcations to the programme. Moreover, these three primary categories were further divided into speciﬁc and general. Speciﬁc comments targeted particular elements of the code, whereas general comments were either vague or more high-level. We deﬁned two additional categories: ‘personal voice’ and ‘off-topic’. Comments were classiﬁed as personal voice if they were written in the second person (i.e. ‘you’) or included other personal features such as emoticons. Off-topic comments were unrelated to the project. This gave us a total of eight categories, and each comment could be classiﬁed with any number of these. (@hamer2015, 155)

Conclusion

We expected, and found, that tutors identify more points to comment on than peers, and are able to make more speciﬁc comments on technical matters such as correctness. The increased frequency of negative comments in reviews by tutors and by high performing students reﬂects a conﬁdence in the course material. This pattern conforms to our own anecdotal observation that negativity increases with the initial acquisition of expertise, but then subsequently reduces as teaching experience tempers expectations. (@hamer2015, 162)

Insights

We, and others, have argued that effective peer review does not depend on the feedback produced by peers being of the same standard as tutors, as its primary value arises from the process of writing a review. (@hamer2015, 162)
- I had not considered this before! Makes sense though. It's the metacognitive act and the analysis against one's own solution that is the most valuable part of PCR.