Computer-Based Testing Vs. Paper-Based Testing

What are the advantages, disadvantages and what is the future of language testing?

The situation now

On the 25th of February 2011 the BBC Today Programme invited Isabel Nisbet, the outgoing chief the UK based qualifications watchdog, Ofqual, to come on and discuss the issue of computer-based testing (CBT). Ms Nisbet stated that the general attitude to CBT had been that it was too difficult a topic to raise, because of the number of various pitfalls, complications and opposition. However, as outgoing chief she had decided to ‘move it off the too difficult pile’ and try and get the subject addressed because she felt it was indeed an important issue. Her explanation was that because children do much of their learning and exploring digitally, they should be assessed in the same way. Her exact words were‘In the future, how things are tested should match how people learn and how they act.’ This echoes back to one of the most important issues with language testing, as raised by Bachman and Palmer (1996) that tests should resemble the real thing for which they are testing. This was one of the criteria for test usefulness; authenticity. Because the way people, especially younger generations, interact with the world is largely going to be through a computer, testing and assessment should reflect that. This is certainly something which is worth discussing as it will undoubtedly have international repercussions across all areas of education.

The advantages

CBT allows for more accurate, secure, rapid and more controlled test administration. From students sitting the test, to tests being marked and results being published, all the way through to researching those data and evaluating the test. This is perhaps something critics of CBT would argue against, but I think any scepticism on this part would be aimed at a mistrust of technology rather than a genuine belief that paper-based testing (PBT) is actually better in these respects. As long as the computers are reliable and secure there is no reason to doubt the claim that CBT is far superior in these respects. I will address the problems here in the next section.
Another great advantage would be that voiced by Nisbet of the authenticity of the tests and the fact that it reflects the real world situational use of the topics being tested. In language teaching, these were referred to as Target Language Use (TLU) domain. It also applies to fields other than language testing. If, upon graduation, you mainly compose emails in French to colleagues and rarely compose postcards on paper, then the test you sat to graduate should reflect this. For my GCSES I wrote a postcard in French as part of the test. I remember I wrote a nice little postcard and then turned to page only to discover to my horror that there was a whole other page of blank space in which to write the ‘postcard’. I was incredibly angry about this, because postcards are short. The test didn’t even match what a ‘postcard’ was in reality. If I was taking that test today, I would be equally annoyed if I was asked to compose and ‘email’ and in fact I was writing it with a pen. I have seen many such examples of this inauthenticity caused by writing on the incorrect media in test preparation courses that I have used as a teacher. This lack of authenticity not only damages the students by not testing them in the context for which they will use their skills, but also damages the face validity of the tests itself, which could lead to resentment and loss of motivation.

In addition, by administering a test on the computer, the use of paper printing is minimised, almost entirely. This could reduce administration costs as well as environmental impact. Of course, that is assuming the institution does not have to buy computers especially for the test. Also, because computers can successfully mark any objective sections (where answers have a clear, binary right or wrong answer) almost instantaneously, the need to pay humans to go through with marking grids is erased. This increases the speed of the results and feedback, as well as cutting costs and of course improving the accuracy of marking.
Other studies have investigated the difference between CB and PB tests (see further reading). Most of them conclude that CBT is advantageous for students and test administrators alike. So, why the opposition to CBT? What are the dangers and what is holding us back?

The disadvantages

expense. technical issues. takes away something from pbt? too dependent on computers. cuts cost of paper and administration
Earlier I mentioned a possible mistrust in technology which deters both students and institutions from implementing CBT. I have experience of this myself, so I don’t want to come across as a blind technology advocate claiming that we do away with all paper-based tests. I remember coaching Diego, a Spanish students for his TOEFL iBT. The system we used was an internal practice test whose server recorded audio from the students’ microphone. We were in the UK and the server was in the US. On top of that, we had a very slow connection and as we were in a very built up area internet contention was also very high. For this reason, at least two or three times every practice test we did, a few students would lose their speaking test answers. Many students experienced irritation at this poor and unreliable technology, especially because they were paying a lot for their courses. I always used to say ‘it won’t happen when you take the real test’. Sadly, it did happen to my Diego. The computer in which he was working during the real test shut-down in the middle of the exam. He was not allowed to re-take and had to pay again to do the test.

Of course, another problem here is that human error can never be completely accounted for when using computers. Diego was the kind of student who was often on the receiving end of inexplicable technical errors. He once kicked the power-switch at the socket by mistake and shut down a neighbour’s computer. However, part of setting up computers for use in class is ensuring that the computers are secure and the workstation is appropriate (ie, power sockets and cables cannot be removed accidentally). Therein lies the problem. CBT does not eliminate human error, but the line between computer error and human error is very fine. In addition, computers do often go wrong, especially older machines and public computer terminals which have hundreds of different users. Administering even thirty or so networked public machines is a full-time job. Computer viruses, bad configuration, faulty hardware and unreliable internet connections all contribute to this. On top of which, students and teachers need the training to use the machines and the specific software. So, computers are no panacea in education.

However, let me point out that humans are just as prone to error. Test results being lost in the post, teachers taking home essays to mark and never returning them. The look on a student’s face when you hand out everyone’s homework except theirs, but they swear they handed it in. Yes, there are plenty of reasons to look for a more secure and reliable alternative to paper-based tests.

As long as there is a skilled and reliable technician on hand before the test (in order to check the machines are correctly set-up and maintained) and during the test. Also, test instructions and support on the use of the computers must be clear and easy to use regardless of your level of computer or language ability. The connections through which information is being sent must be secure and reliable in order to send through answers and results. Where possible, the machines should be up-to-date and have been regularly serviced to avoid malfunctions. These are things which don’t just apply to CBT but any use of computers in general, however where the stakes are higher (as in with an institutionalised language test) the need to ensure all these things are in place increases.

The future

The last ten or fifteen years have seen computers fall in price and rise significantly in terms of reliability and power. Most schools, universities and private language institutions in developed countries around the world are well equipped to offer students computer and internet access. For this reason, it seems fitting that people such as Isabel Nisbet should raise the issue of introducing computer-based testing to replace traditional paper-based tests on a national scale. As with many technological advances, language teaching may well be in the vanguard of this conversion to CBT, however that means it is more likely the technology will be tried and tested by the time it takes over. It is not likely to happen overnight, or even within the next 5 years. What is certain is that large institutionalised tests will have to offer the option to students to take the test on the computer, and gradually the PBTs will be phased out. Already, TOEFL iBT is doing this, with TOEFL PBT fast becoming obsolete. However, ETS has had to redesign much of the test and the way it is administered in order to maximise the computer-based format. For this reason I have a lot of respect for the TOEFL iBT. However, TOEIC remains purely paper-based. The Cambridge ESOL suite offers both PBT and CBT versions of all its tests, except IELTS. IELTS can be taken on the computer only in the Delhi centre, although it is likely the IELTS CBT will be rolled-out worldwide very soon (please continue to check www.engnet-education.com for updates and news on this).

Conclusion

It is a gradual process, much like the way CDs replaced cassette tapes. Of course, language teaching is one of those rare professions where teachers still can be seen running to and from the staffrooms around the world clutching pre-wound cassettes where the rest of the world is using mp3s. There are exceptions of course, but my point is there will not be an overnight switch to CBT even if there is a national government level initiative. For this reason it is important to continue researching and promoting the use of CBT because it does reflect the way things are going. It is true that we use computers more and more for communication, and as such our tests should reflect that.

Further reading

Bachman, L.F., Palmer, A.S., 1996. Language Testing in Practice. Oxford University Press, Oxford.

Extended Bibliography

Bachman, L. 1990: Fundamental considerations in language testing. Oxford University Press, Oxford.

Bachman, L.F., Palmer, A.S., 1996. Language Testing in Practice. Oxford University Press, Oxford.

Bachman, L. 2001 Designing and developing useful language tests in Elder, C. Brown, A. Grove, E. Hill, K. Iwashita, N. Lumley, T.

McNamara, T. O’Loughlin, K. (Eds.) Studies in Language Testing 11: Experimenting with uncertainty. Essays in honour of Alan Davies. Cambridge University Press, Cambridge

Brown, H. Douglas. 2004 Language Assessment: Principles and Classroom Practices Longman : New York

Chalhoub-Deville, Micheline & Deville, Craig 2005 A look back at and forward to what language testers measure In Hinkel, Eli (Ed.) Handbook of research in second language teaching and learning Routledge

Chapelle Chapelle MaryK . Enrigh Jonn M. Jamieson 2008 Building A Validity Argument For The Test Of English As A Foreign Language Routledge, New York

Douglas, Dan 2001 Three problems in testing language for specific purposes: Authenticity, specificity and inseparability in Elder, C.

Brown, A. Grove, E. Hill, K. Iwashita, N. Lumley, T. McNamara, T. O’Loughlin, K. (Eds.) Studies in Language Testing 11: Experimenting with uncertainty. Essays in honour of Alan Davies:

Downey, R. Farhady, H, Present-Thomas, R. Suzuki, M. Van Moere, A. Evaluation of the Usefulness of the Versant for English Test: A Response Language Assessment Quarterly, Volume 5, Issue 2 April 2008 , pages 160 – 167 Cambridge :

Fulcher 2000 The ‘communicative’ legacy in language testing System, 28 (4), p.483-497, Dec 2000 doi:10.1016/S0346-251X(00)00033-6

Hughes, A. 1989 Testing for Language Teachers Cambridge University Press, Cambridge

Lewkowicz, J.A., 1997. Authenticity for whom? Does authenticity really matter? In: Huhta, A., Kohonen, V., Lurki-Suonio, L.,

Luoma, S. (Eds.), Current Developments and Alternatives in Language Assessment. Jyvaskyla University, Finland, pp. 165-184.

Lewkowicz, J.A., 2000. Authenticity in language testing: some outstanding questions. Language Testing 17 (1), 43-64. Cambridge University Press DOI: 10.1080/15434300801934744

Lynch, Brian K. 2003 Language Assessment and Programme Evaluation Edinburgh University Press, Edinburgh

McNamara 2006 Validity in Language Testing: The Challenge of Sam Messick’s Legacy Language Assessment Quarterly, Volume 3, Issue 1 January 2006 , pages 31 – 51 DOI: 10.1207/s15434311laq0301_3

Messick, S., 1989. Validity. In: Linn, R.L. (Ed.), Educational Measurement. Macmillan, New York, pp. 13 -103.

Morrow, K., 1979. Communicative language testing: revolution of evolution? In: Brumfit, C.K., Johnson, K. (Eds.), The Communicative Approach to Language Teaching. Oxford University Press, Oxford, pp. 143-159.

North, Brian. 2007 Expanded set of C1 & C2 Descriptors http://www.coe.int/T/DG4/Portfolio/?L=E&M=/documents_intro/Data_bank_descriptors.html

O’Malley, J. Michael & Valdez Pierce, Lorraine 1996 Authentic Assessment for English Language Learners: Practical Approaches for Teachers Longman

Phakiti, Aek. 2008 Construct validation of Bachman and Palmer’s (1996) strategic competence model over time in EFL reading tests Language Testing; 25; 237 DOI: 10.1177/0265532207086783

Popham, W. James 1990, Modern Educational Measurement Prentice Hall, Englewood

Stoynoff & Chapelle 2005 ESOL Tests and Testing Teachers of English to Speakers of Other Languages, Inc. Maryland

Stoynoff, Stephen 2009 State-of-the-Art Article Recent developments in language assessment and the case of four large-scale tests of ESOL Language. Teaching. (2009), 42:1, 1–40 Cambridge University Press DOI:10.1017/S0261444808005399

Widdowson, Henry 1979: Explorations in applied linguistics. Oxford University Press, Oxford

Widdowson, Henry 2001 Communicative language testing: the art of the possible in Elder, C. Brown, A. Grove, E. Hill, K. Iwashita, N. Lumley, T. McNamara, T. O’Loughlin, K. (Eds.) Studies in Language Testing 11: Experimenting with uncertainty. Essays in honour of Alan Davies Cambridge University Press, Cambridge
Widdowson, Henry 1983 Learning Purpose and Language Use. Oxford University Press, Oxford.