This chapter overviews the integrative evaluation studies done during the MANTCHI project. About 20 studies were done of the many deliveries. In this chapter, some main conclusions each drawing on evidence from multiple studies are presented, together with discussion on the strengths and weaknesses of the method employed. Overall, the evaluation studies provided both formative and summative results. The formative information for improving the overall delivery of the material is exemplified by a selection of issues and recommendations, and a sample of the evidence on which they were based. The summative evidence on learning outcomes and quality suggests that the project materials are at least as effective as other material, although by no means always preferred by students. Our studies suggest however that the main gains are in improved curriculum content and in staff development (expanding the range of topics a teacher is confident of delivering), but that different evaluation methods must be developed and applied to study that properly.
This chapter discusses the evaluation work on the MANTCHI project (described below) as a whole. It concerned collaborative teaching over a metropolitan area network, and so involved a mixture of face to face and distance learning, and separate issues of collaboration between learners in different places, and of teachers in different institutions. The project focussed on "tutorial" material rather than primary exposition like lectures, and was interested in the re-use of "tertiary" material such as past student solutions, and whether this was useful to students.
The evaluation work comprised about 20 studies. Full reporting of so many studies, even though of modest size, is not possible within a single chapter. On the other hand, such a chapter can (and this one does) select the main conclusions to report, and draw on evidence across multiple studies to support them. It can also discuss evaluation methods and problems on a broader basis than a single study could.
The chapter is divided into three parts, the first dealing with the overall approach, the second with particular findings, and the third reviewing the adequacy of the evaluation methods. The first part introduces the project's distinctive features and the particular demands these placed on evaluation. It then discusses our approach to evaluation, based on the method of Integrative Evaluation and so emphasising observations of learners engaged in learning on courses for university qualifications. The second part is organised around the major findings, briefly reporting the evidence for each. These findings are grouped into perspectives: learning effectiveness and quality, features of the teaching, and issues of the management of learning and teaching. The third part discusses the successful and unsuccessful aspects of our approach, and offers suggestions for future improvements to the method.
Part A: The project and evaluation activities as a whole The MANTCHI project
The MANTCHI project — Metropolitan Area Network Tutoring in Computer-Human Interaction — (MANTCHI; 1998), involving four universities in central Scotland for about 19 months, explored the development and delivery of tutorial material in the subject area of Human Computer Interaction (HCI) over the internet in existing university courses for credit. "Tutorial" was broadly defined to mean anything other than primary exposition (such as lectures). This material might for instance consist of an exercise created at another site and backed up by a remote expert (often the author), who might give a video conference tutorial or give feedback on student work submitted and returned over the internet. In some cases students on different courses, as well as the teachers, interact. A unit of this material is called an "ATOM" (Autonomous Teaching Object in MANTCHI), and is typically designed as one week's work on a module for a student i.e. 8 to 10 hours, including contact time. Responsibility for the courses and assessment remained ultimately with the local deliverer. In this chapter, we refer to these deliverers and also to the authors and remote experts as "teachers" (though their job titles might be lecturer, professor, teaching assistant etc.) in contrast to the learners i.e. university students.
Teaching and learning normally includes not only primary exposition (e.g. lectures) and re-expression by the learners (e.g. writing an essay), but some iterative interaction between teacher and learners (e.g. question and answer sessions in tutorials, or feedback on written work). Mayes (1995) classifies applications of learning technology into primary, secondary, and tertiary respectively by reference to those categories of activity. Technology such as email and video conferencing supports such tertiary applications. MANTCHI focussed on tertiary applications, and the additional research question of whether such interactions can usefully be captured, "canned", and later re-used. We called such canned material "TRAILs" (Tertiary Reusable ATOM Instantiated for Learning).
A key emerging feature of the project was its organisation around true reciprocal collaborative teaching. All of the four sites have authored material, and all four have received (delivered) material authored at other sites. Although originally planned simply as a fair way of dividing up the work, it has kept all project members crucially aware not just of the problems of authoring, but of what it is like to be delivering to one's own students (in real, for-credit courses) material that others have authored: a true users' perspective. This may be a unique feature. MANTCHI has in effect built users of higher education teaching material further into the design team by having each authoring site deliver material "not created here".
The project devoted substantial resources to evaluation of these teaching and learning activities. This chapter offers a general report on that evaluation activity.
The evaluation approach
The evaluation method used was based on that of Integrative Evaluation (Draper et al. 1996), which was developed during the TILT project (Doughty et al., 1995; TILT, 1996). The main characteristic is to study teaching and learning in actual classroom use for reasons of validity that are of particular importance in the area of learning in Higher Education for two reasons. Firstly learning here is characterised by, and depends upon, conscious effort and choice, and hence on motivation: if you persuade subjects to use learning materials for an experiment they behave quite differently than if they are trying to get a qualification. Secondly, learning cannot be seen as a simple effect caused by teaching, but as the outcome of a whole ensemble of factors of which the intervention being studied (for instance some piece of learning technology, or in this case the ATOM materials) is only one. Trying to isolate one factor experimentally is usually unrealistic (leading to invalid studies) because in normal education, learners simply shift their use of resources in response to the particular characteristics of what they are offered. This view of teaching and learning is consistent with Laurillard's (1993) model of the process, which suggests that 12 activities (of which primary exposition such as a lecture or textbook is only one) are involved. When we measure learning outcomes, we are measuring the combined effect of all of these not just the one we varied, and the others are unlikely to have been held constant as learners adjust their use of activities and resources as they think necessary (e.g. ask questions of their tutor when but only when they feel the primary exposition was unclear). Such adjustment (i.e. this learner control) is a salient feature of higher education, and not something it is appropriate to suppress in a meaningful study, even if it were possible and ethical to do so.
This is bad news for simple summative evaluation that aims to compare alternative teaching to decide which is best, as Draper (1997b) argues. However evaluation is still possible and worthwhile, but turns out to be mainly useful to teachers in advising them on how to adjust the overall teaching to improve it: effectively this is formative evaluation of the overall teaching and delivery (not just of the learning technology or other intervention), called "integrative" because many of those adjustments are to do with making the different elements fit together better.
Consistent with that formative role, we have found that many of the most important findings in our studies have been surprises detected by open-ended measures, and not answers to questions we anticipated and so had designed comparable measures for. By "open-ended" we mean that the subject can respond by bringing up issues we did not explicitly ask about, so that we cannot tell how many in the sample care about that issue, since they are not all directly asked about it and required to respond. For example, we might ask "What was the worst problem in using this ATOM?" and one or two might say "getting the printer to work outside lab. hours". The opposite category of measure is "comparable", where all subjects are asked the same question and required to answer in terms of the same fixed response categories, which can then be directly compared across the whole sample. We use about half our evaluation effort on open-ended measures such as classroom observation and open-ended questions in interviews and in questionnaires, with the rest spent on comparable measures such as fixed-choice questions applied to the whole sample.
In previous applications of Integrative Evaluation, an important type of comparable measure has been either confidence logs or multiple choice quiz questions that were closely related to learning objectives in order to measure learning gains. In MANTCHI, while extensively used, they are of less central importance, as the material of interest is not the primary exposition but student exercises and the interactions and feedback associated with it. Furthermore, when we asked teachers why they used the exercises they did, their rationales seldom mentioned learning objectives but seemed to relate to aims for a "deeper" quality of learning. We tended to make greater use of resource questionnaires (Brown et al. ,1996) to ask students about the utility and usability of each available learning resource (including the new materials) both absolutely and relative to the other available resources. In this way, we adapted our method to this particular project to some extent, although as we discuss later, perhaps not to a sufficiently great extent.
Our evaluation work
We carried out about 20 studies in all (each of a different delivery of an exercise) in the four universities. These studies were divided into three phases. In the first period, we did some studies of the courses into which new material would be introduced, both to get some comparison for later reference, and to gain experience of studying tutorial exercises. In the second period (the autumn term of 1997), we studied the delivery of the first ATOMs, and circulated some initial practical lessons. In the third period, we studied many more deliveries of ATOMs.
A typical study would begin by eliciting information from the teacher (the local deliverer in charge of that course) about the nature of the course and students, and any particular aims and interests that teacher had that the evaluation might look at. It would include some classroom observation, and interviewing a sample of students at some point. The central measures were one or more questionnaires to the student sample, most importantly after the intervention at the point where it was thought the students could best judge (looking back) the utility of the resources. Thus for an exercise whose product was lecture notes, that point was the exam since lecture notes are probably most used for revision. For a more typical exercise where the students submitted solutions which were then marked, we sometimes chose the time when they submitted their work (best for asking what had been helpful in doing the exercise) and sometimes the time when they got back comments on their work (best for judging how helpful the feedback was). An example questionnaire from a single study is given in the appendix.
Part B: Findings from the evaluation work
There are a number of different ways to organise our findings. The most obvious one would be to organise them by study i.e. evidence first then results. The disadvantage of that is that it tends to split up pieces of evidence that support each other if they were gathered in different studies. Instead, what seem to be the most important findings are simply stated, and then the evidence supporting those conclusions is summarised.
This gives a large set of small sections. One way of grouping them would be by stakeholder perspective: what did the students think? what did the teachers think? what would educational managers e.g. heads of department think? The grouping adopted here is slightly different, into learning issues (e.g. what can we say about learning effectiveness and quality), teaching issues (e.g. is it worthwhile having remote experts?), and management issues (e.g. tips for organising the delivery of ATOMs).
There is some correlation between this grouping and the methodological one of comparable versus open-ended measures. Because we knew in advance we were interested in the issue of learning effectiveness we designed comparable measures for this, whereas most of the management issues emerged from open-ended measures (and complaints). However this correspondence is only approximate: for most issues there is some evidence from both kinds of measure.
Learning effectiveness and quality
The most important prior question about the project's teaching innovations is whether they increase or decrease learning quality and quantity. No significant or even noticeable differences were seen in exam and other assessment scores. However since these exercises were, like tutorials, only part of the teaching and learning activities on each topic, this neither is surprising nor would have been conclusive if observed.
Instead, we may ask questions about the value of the novel learning resources offered to students in this project, both generally and in comparison to others available for the same topic: were they valued or not, important or of little impact? These questions were mainly addressed through using forms of the resource questionnaire (Brown et al. 1996) which asks students to rate the utility of the learning resources available to them.
The clearest evidence for the value of an ATOM as a resource was found for the ATOM on CSCLN: Computer Supported Cooperative Lecture Notes. In this ATOM, the class was divided evenly into teams, with one team assigned to each of the 20 lectures on that module plus one team for the index. Each team had to produce lecture notes for their assigned lecture in the form of a web page, structured as a set of key questions addressed by that lecture and answers for those questions, while the index page maintained a table to these other pages both by timetable (when the lecture was given) and by question (merged from the content of the pages). (This ATOM, and the pages produced by students, may be seen on the WWW: Draper, 1998.)
The main evidence came from a short questionnaire which, since lecture notes find their main use when revision for exams is being done, was administered directly after the exam. Of 59 students, 98% responded; and of these 84% said they had referred to the communal lecture notes, 76% said they found them useful, and most important of all, 69% said they found them worth the effort of creating their share of them. They also, as a group, rated these web notes as the third most useful resource (after past exam questions and solutions, and the course handouts). This shows that, while not the most important resource for students, nor universally approved by them, this exercise had a beneficial cost-benefit tradeoff in the view of more than two thirds of the students. It may be another manifestation of the importance of active learner manipulation of knowledge representations discussed by Ravenscroft et al. (1998).
The same ATOM delivered in different universities
Another good source of evidence comes from the UAN (User Action Notation) ATOM which was delivered to four groups of students at three universities. (This ATOM may be found through the project web pages; MANTCHI, 1998.) All students were asked "How much did you benefit by taking part [in the UAN ATOM exercise]?". Only three students in one university actually rated this as zero benefit. They were also asked the more directly interesting question "How did you rate the ATOM as a method of learning compared to the 'traditionally' delivered units of work experienced on your course?". At the first university, where the ATOM's author was also the local deliverer, 41% rated the ATOM as a superior, 50% as a similar, and only 9% as an inferior method. At the second, only 25% rated as superior and 75% as inferior. At the third, there was a very low response rate (8%) for the questionnaire with this question, but in that sample all (100%) rated it as a superior method.
This is clearly a mixed story. The unfavourable responses are clearly associated in the data as a whole with a high number of complaints about delivery (rather than content) issues, which are discussed below, and also with it not being directly assessed or compulsory. An interesting point here, though, is that the method was most favourably received (at least for the UAN material) where the author was also the local deliverer: so that everything was constant except the formatting as an ATOM. This is direct comparative evidence on the ATOM format itself.
Three ATOMs at one university
Three different ATOMs were delivered in a single course (along with non-ATOM-ised topics) at one university, which should have allowed comparisons to be drawn by the same students among ATOMs, and between ATOMs and other topics not organised in this way. However the main feature here turned out to be the declining numbers of students completing the ATOMs. In this case, the ATOMs were not done for direct credit (marks given for coursework), but only indirectly as the ATOMs were topics that would later be assessed for credit in other ways. Many of these students stated that it was not worth the effort of doing the work without direct credit. In a class of 50, 37 did the first ATOM, 15 the second, and none did the third. This of course destroyed the opportunity for a good cross-ATOM comparison, but directed our attention to the workload question.
Students were asked "Was the 'workload' of the ATOM right for you?" on a 5 point scale. (This question appears in the appendix.) It is interesting that at a university where the ATOM was compulsory, only 32% of the students (who were in year 3 of 4 undergraduate years) rated it above (harder) than the neutral point, whereas at the university where it was not directly assessed (and the learners were a mixture of year 4 and M.Sc. students) 62% rated it as harder work than seemed right.
Features of the teaching
A feature of some of the ATOMs was the involvement of a remote expert at another university. For one ATOM, students at three universities were asked about the usefulness of "receiving feedback etc. from the remote expert on your group's solutions to the tasks". On a 5 point scale with the lowest point meaning of no use at all, the proportion rating it at one of the top 3 ("useful" or better) points were 87%, 57% and 75% in the three universities. This combines the usefulness of getting the feedback with the fact that this came from a remote rather than local expert. It supports the idea that remoteness is at the least not an important drawback for this function.
Open-ended measures suggest a mixed story. For instance at one university a class contained both year 4 undergraduates and M.Sc. students. The latter perceived much more benefit in having a remote expert than the former. Elsewhere, some students suggested that a benefit of remote experts was not that they had more authority (as national or international experts in the topic), but that because they were not in charge of assigning marks, the students felt freer to argue with them and challenge their judgements. This, if generally felt, would certainly be an advantage in many teachers' eyes, as promoting student discussion is often felt to be difficult. It is also interesting in that it is largely the opposite of what the teachers' seemed to feel. For them, the remote expert gave them confidence to deliver material they did not have a deep grasp of, and to handle novel objections and proposed solutions that students come up with.
In some ATOMs, "tertiary" materials (i.e. past exercises, student solutions, and tutor feedback on those solutions) were made available to students. Exploring the use of this kind of material was one of the original aims of the project. Obviously it could not be provided until the ATOM to which it belonged had already been delivered, and so had generated student material for later re-use (unless simulated by the ATOM's author). Evidence of the utility of such material (called "TRAILs") to later students is positive but scanty.
When the statechart ATOM was delivered at one university, there were 50 in the class, of which 24 completed the questionnaire while 15 did the exercise, but only 9 did both. Of these, 6 used the TRAIL, and all 6 of these found it at least "useful". When the same ATOM was delivered at a second university, there were 11 in the class, of which 9 returned the questionnaire while 6 did the exercise as well as the questionnaire. Of these, 2 used the TRAIL and both of these found it useful. Thus although we may say that 100% of those who used a TRAIL rated it a useful resource, the numbers using it (whether from choice or simply from happening to notice it) were too low to give much certainty about this positive result by themselves. Open-ended comments were a second source of evidence supporting the positive interpretation, although on an even slenderer numerical base: "[TRAILs] gave an indication of what was expected, though we felt the quality of the submissions was generally poor, we had no knowledge of the acceptable standard required.", and "Bad examples more useful than good. Can see how (and why) NOT to do things. This is much better than being told how to do something 'this way' just 'because'."
Still other kinds of evidence also suggest that this is an important resource to develop further. Firstly, theoretical considerations on the importance of feedback for learning support it. Secondly, in the CSCLN ATOM, the most valued resource overall was past exam questions and outline answers: a similar resource to TRAILs. Thirdly, and perhaps most important, in courses without such resources the open ended responses frequently ask for more feedback on work, model answers, and so on, strongly suggesting a widespread felt need for resources of this kind.
From the start of our evaluations, as has been the case in many other projects, many of the points that emerged as problems were not about learning outcomes nor about the design of the learning material itself, but were practical points about the management or administration of the activities (e.g. informing students properly about resources and deadlines, availability of computing and other resources). In some descriptions of the educational process these issues are called delivery or implementation (cf. Reigeluth; 1983). From our perspective of seeing learning as the outcome of a whole set of activities (not the one-way delivery of material), we categorise these issues as the management of the learning and teaching process: about coordinating and organising those activities, rather than designing their content. This view is presented as an extension to the Laurillard model in Draper (1997a), and seen as at bottom a process of negotiation (tacit or explicit) between teachers and learners.
These findings did not mainly emerge from comparable measures designed to test learning outcomes, but usually from open-ended measures that yield (among other things) complaints by students: mainly open-ended questions in questionnaires adminstered to whole classes, interviews with a subset of nearly every class we studied, and the direct classroom observations we did in a majority of our studies. Here we present a sample of such findings, together with suggestions for responses as they were circulated within the project. (A complete set is available in Brown and Draper, 1998.) Full lists of them, usually with the student comments transcribed in full, were fed back to the course deliverers for use in improving delivery next time. For the first item, we give details of the evidence on which the finding and recommendations were based; for other items (for reasons of space) details of the evidence are omitted. Should it be important to clarify an issue first identified by open-ended measures, then a more systematic measure could be applied. For instance, when the difficulties of group-work, and claims about high work load appeared, we then designed some systematic measures of these to investigate them further.
Web-based vs. Paper Resources
Web-based instructions and resources may also need to be given to students on paper. During student use of some ATOMs, lecturers handed out paper-based instructions and resources. In some cases this was because the students were unable to access the web-based resources, in other cases it was because the lecturer wished to give the students additional instructions which superceded those on the ATOM web page.
Students usually download and print the web-based resources which is less efficient than these resources being centrally copied on to paper and handed out. Students reported that though it can be useful to access information etc. electronically, this is not always possible and anyway they like having a hard copy on which they can make notes. This also covers the problem of the network not functioning when needed by the students. It is also likely that the students will not have continuous access to computers while completing their assignments.
An example of the evidence (Web-based vs. Paper Resources)
The evidence on which the conclusion and recommendation on web-based vs. paper resources were based was as follows. In one study, all were asked if they had any problems while accessing the web-based resources. 25% reported some problem, examples being "Password problems plus early setbacks with software.", "On learning space — crashes".
In a second study, all students were asked "Did you experience any difficulty gaining access to any resources / activities during the use of the ... ATOM?" 3 (13.6%) reported problems: "Remote web page" "server was down from where I had to access on-line." "Lab was too busy during lab sessions". They were also asked about resources for which there was insufficient time, which yielded comments including these: "Remote web page was too remote, took a very long time to view", but another student said "None! Most are web-based and therefore can be accessed at any time, when most convenient".
In a third study, students were asked "What else would have helped at the two tutorials this week?" which elicited an 83% response rate including this long reply: "Computer equipment that worked! A lot of time was wasted in tutorials trying to fight with the equipment being used. It is not a necessity to teach through the use of computers when teaching to a computer course. In fact the opposite is true because computer students above all recognise the problems that can occur by over complicating a problem by using advanced computing e.g. the newsgroup on a web site (where a simple newsgroup added on to [the] news server would have achieved the same inter-communication and been far more reliable/faster than web browsing) and using the scanner (where simply drawing the chart on the computer would have been much faster and produced much clearer results for everyone to view). This is not a criticism of the ATOMs or the teaching method but more of the implementation which although seeming perfectly reasonable proved only to hinder our progress in learning about this topic!"
In a fourth study, students were asked "Did you print the ATOM information and scenario from the Web?"; 10 (45.5%) said yes, 12 (54.5%) said no. They were asked if they used the paper or web form: 7 (31.8%) said paper, 10 (45.5%) said web, 4 (18.2%) said both. They were asked to explain why; among the numerous comments were "I like to save paper", "I took the work home", "some documents don't print well", "Web-based was easier to refer to related documents because of links".
In a fifth study, printed versions were provided but students were asked if they had already printed out the web documents: 25% said yes. When asked which form they used, 45.8% used paper form, 20.8% the web form, 12.5% used both, and 20.8% didn't answer.
In a sixth study, when asked how the ATOM compared to traditionally delivered units, one student said "Personally, I do not like using the net as a learning aid, I spend enough time working on a PC as it is without having to rely on the World Wide Wait to scroll through text on screen. Call me old fashioned, but I do prefer reading from books/journals/papers - a bit more portable and quicker to access - I wish I'd recorded how much time I waste during a week logging on, waiting for Win95 to start, waiting for Netscape etc etc etc. If I have an hour free in between lectures it is just impractical to get any work done on a PC."
In a seventh study, when asked to comment on "How useful do you consider the ... ATOM Web-based resources were to you in learning & understanding ...? ", two explanatory comments (for low usefulness ratings) were "Items in pdf format prohibited many people viewing the docs", and "Paper-based notes are easier to manage and access. Paper notes don't crash!"
Content of ATOM
Students would find it useful if the ATOM contained clear information on: which resources will be delivered locally (in house), what to use: (e.g. a real physical radio alarm in an exercise on formal descriptions), access passwords, the approximate date on which feedback will become available.
Group work involves extra organisation and time which has to be taken into account. Students recognised the benefits of group work, but found that it took more time than working in pairs or alone. This appeared to matter more where the task was not directly assessed. If possible, group work should be mainly within regular timetabled sessions of course to avoid clashes between courses. Similarly video conferences should also be within regular timetabled sessions. (The general problem is that of organising group meetings and irregular class meetings, which suddenly require new times to be found in the face of, for many students, conflicting classes and paid employment.)
Students should be alerted when feedback on their solutions is available. They should also be alerted when feedback on the solutions from other universities is available. That is, posting them on the web without emailing an announcement is unsatisfactory.
Instructions about the ATOM resources and assignments have to be sent to students in plenty of time. Students admitted that even if they are given information in plenty of time they may not act on it. However where web-based resources (or any resources) have to be used before an assignment is to be attempted, students have to be given clear instructions in plenty of time for them to be able to plan and use the resources. They have to have the information to allow them to manage their time effectively.
Domain Expert and Local Deliverer
It should be clear to students whether the "in-house" teachers are "experts" or "facilitators". Each ATOM has a domain expert. The lecturer delivering the ATOM to his/her students need not be an expert in the subject. It is useful if the students are made aware of whether the lecturer will be "facilitating" and not teaching and that the subject will involve "resource-based" learning utilising the ATOM Web-based resources and a domain expert.
Collaboration between Students from different Universities
One of the ATOMs involved students at two Universities. Comments from students at both universities indicated some rivalry and comparison was also used in the feedback. Although this can be a good thing, we have to be careful to avoid the collaborations discouraging some students from actively participating.
The Integration of ATOMs into Courses
ATOMs are discrete units. The point has been raised that ATOMs could fragment a course reducing the possibility of relating that topic to other parts of the course. This could be a problem especially if several are used. It is something that we should be aware of and should discuss. Students being required to write a report involving the topics studied on two ATOMs appeared to be successful at one University.
Many of these points will seem obvious to readers, not so much from hindsight but because they are familiar points in the education literature. They are often rather less familiar to higher education teachers, who seldom or never read that literature, and who have very many such practical details to deal with in delivering any course (another reason for calling them "management" issues). This suggests that many gains in learning and teaching quality might be made, not by technical and pedagogical innovation, but by attention to best practice at this management level, backed by integrative evaluation to detect and feed back those points that emerge strongly as issues in each particular case.