TESTING

TESTING AND TEACHING
A Very Brief Introduction

Tran Huy Phuong
ENGLISH DEPARTMENT

First, it is important to understand what the difference is between testing and teaching. In some ways the two are interwoven and interdependent that it is difficult to tease them apart. Every instructional sequence, if it is of any worth at all, has a testing component to it, whether the tests themselves are formal or informal. That is, teachers measure or judge learners' competence all the time and, ideally, learners measure and judge themselves. Whenever a student responds to a question or tries out a new word or structure, you might be testing that student. Written work is a test. Oral work is a test. Reading and writing performance are tests. How, then, are testing and teaching different?

The difference lies in what we'll call formal and informal testing. The above examples referred to informal testing: unplanned assessments that are made as a course moves along toward to its goals. Most informal testing is what testing experts call formative evaluation: assessing students in the process of 'forming' their competencies and skills with the goal of helping them to continue that growth process. Our success as teachers is greatly dependent on this constant informal assessment for it tells us how well learners are progressing toward goals and what the next step in the learning process might be. Formal tests are exercises or experiences specifically designed to tap into an extensive storehouse of skills and knowledge, usually within a relatively short time limit. They are systematic, planned sampling techniques constructed to give teacher and student an appraisal, as it were, of their achievement. Such tests are often summative, as they occur at the end of a unit, module, or course, and therefore attempt to measure, or summarize, what a student he grasped.

Pedagogically, it is very important for you to make the distinction between teaching and formal testing, especially from the point of view of principles of intrinsic motivation. For optimal learning to take place, students must have the freedom in the classroom to experiment, to try things out, to 'test' their own hypotheses about language without feeling that their overall competence is being 'judged' in terms of these trials and errors. In the same way that, say, tournament tennis players must have the freedom to practice their skills - with no implications for their final placement - before the tournament itself begins, so also must your learners have ample opportunities to 'play' with language in your classroom without being formally graded.

Testing, then, sets up the practice games of language learning, the opportunities for learners to listen and think and take risks and set goals and process feedback and cycle and recycle through whatever it is that they are trying to set in place. While we teach, formal testing places a different set of expectations on students. Formal tests are the tournament games, or the 'recitals' that periodically occur in the learning process.

The effect of testing on teaching and learning is known as backwash. Backwash can be harmful or beneficial. If a test is regarded as important, then preparation for it can come to dominate all teaching, and learning activities. And if the test content and testing techniques are at variance with the objectives of the course, then there is likely to be harmful backwash.

However, backwash need not always be harmful; indeed it can be positively beneficial. A test is likely to bring about beneficial backwash if it is designed based directly on an analysis of the English language needs of the students and includes tasks as similar as possible to those they will have to perform in their future use of the language. The relationship between teaching and testing is surely that of partnership. It is true that there may be occasions when the teaching is good and appropriate and the testing is not; we are then likely to suffer from harmful backwash. But equally there may be occasions when teaching is poor or inappropriate and when testing is able to exert a beneficial influence - proper changes such as the syllabus redesigned, new teaching techniques chosen, classes conducted differently, etc. We cannot expect testing only to follow teaching. What we should demand of it, however, is that it should be supportive of good teaching and, where necessary, exert a corrective influence on bad teaching. If testing always had a beneficial backwash on teaching, it would have a much better reputation amongst teachers.

As described above, in broad categories testing can be classified as either summative or formative. Summative testing often occurs at the end of a course, and it is designed to evaluate the sum total of what has been taught and learned. There are usually no opportunities for further input or performance. The most common example of a summative test is a final exam. Formative tests, on the other hand, are designed to help "form" or shape the learners' ongoing understanding or skills while the teacher and learners still have opportunities to interact for the purposes of repair and improvement. Examples include quizzes (5-15 minutes), class interaction activities such as paired interviews, and chapter or unit tests. Unfortunately, however, even tests that are usually formative, such as quizzes, become summative instruments when they are used simply as opportunities to put grades in a gradebook to be averaged at the end of a given term. A sufficient amount of formative testing must be done in the classroom in order to enable students to revisit and review the material in a variety of ways.

Research on foreign language testing has distinguished between two different testing formats: (1) discrete point tests and (2) integrative or global tests. The contrast between these two formats often reflects different teaching philosophies. A discrete point test focuses on one linguistic component at a time: grammar, vocabulary, syntax, or phonology. Test items include a variety of formats, such as multiple-choice, true-false, matching, and completion, in which the language is segmented into parts for individual analysis. Discrete point tests have traditionally featured unconnected sentences lacking in meaningful or natural contexts. These tests have also tended to assess one skill at a time, such as listening, reading, or writing. Unlike discrete point tests, integrative or global tests assess the student's ability to use various components of the language at the same time, and often multiple skills as well. For example, an integrative test might ask students to listen to a taped segment, identify main ideas, and then use the information as the topic for discussion, as the theme of a composition, or to compare the segment to a reading on the same topic.

Another important distinction that should be made in testing is that between objective and subjective tests. Usually these two types of tests are distinguished on the basis of the manner in which they are scored. An objective test is said to be one that may be scored by comparing examinee responses with an established set of acceptable responses or scoring key. No particular knowledge or training in the examined content area is required on the part of the scorer. A common example would be a multiple-choice recognition test. Conversely a subjective test is said to require scoring by opinionated judgment, hopefully based on insight and expertise, on the part of the scorer. An example might be the scoring of free, written compositions for the presence of creativity in a situation where no operational definitions of creativity are provided and where there is only one rater. Many tests, such as cloze tests permitting all grammatically acceptable responses to systematic deletions from a context, lie somewhere between the extremes of objectivity and subjectivity. So-called subjective tests such as free compositions are frequently objectified in scoring through the use of precise rating schedules clearly specifying the kinds of errors to be quantified, or through the use of multiple independent raters.

Types of language tests

On the ground of their purposes, language tests can be classified into five types:

Proficiency
Proficiency tests are designed to measure people's ability in a language regardless of any training they may have had in that language. The content of a proficiency test, therefore, is not based on the content or objectives of language courses which people taking the test may have followed. Rather, it is based on a specification of what candidates have to be able to do in the language in order to be considered proficient or having sufficient command of the language for a particular purpose.

Achievement tests
Most teachers are unlikely to be responsible for proficiency tests. It is much more probable that they will be involved in the preparation and use of achievement tests. In contrast to proficiency tests, achievement tests are directly related to language courses, their purpose being to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives. They are of two kinds: final achievement tests and progress achievement tests.

Final achievement tests are those administered at the end of a course of study. Clearly the content of these tests must be related to the courses with which they are concerned. Some language testers argue that the content of a final achievement test should be based directly on a detailed course syllabus or on the books and other materials used. Since the test only contains what it is thought that the students have actually encountered, and thus can be considered a fair test. The disadvantage is that if the syllabus is badly designed, or the books and other materials are badly chosen, then the results of the test can be very misleading. Successful performance on the test may not truly indicate successful achievement of the course objectives.

The alternative is to base the test content directly on the objectives of the course. This has a number of advantages. It forces course designers to be explicit about objectives. Secondly, it makes it possible for performance on the test to show just how far students have achieve those objectives. In turn, this puts pressure on those responsible for the syllabus and for the selection of books and materials to ensure that these are consistent with the course objectives. Tests based on objectives work against the perpetuation of poor teaching practice. This will provide more accurate information about individual and group achievement, and it is likely to promote a more beneficial backwash effect on teaching.

Progress achievement tests, as their name suggests, are intended to measure the progress that students are making. Since 'progress' is towards the achievement of course objectives, these tests too should relate to objectives.

Diagnostic tests
Diagnostic tests are used to identify student's strengths and weaknesses. They are intended primarily to ascertain what further teaching is necessary. Indeed existing proficiency tests may often prove adequate for this purpose.

Placement tests
Placement tests, as their name suggests, are intended to provide information which will help to place students at the stage (or in the part) of the teaching program most appropriate to their abilities. Typically they are used to assign students to classes at different levels. The placement tests which are most successful are those constructed for particular situations. They depend on the identification of the key features at different levels of teaching in the institution.

Aptitude tests
These tests are designed to assist in the decision of who should be allowed to participate in a particular course. This becomes a matter of serious concern when there are more applicants than spaces available. Such selection decisions are often made by determining who is most likely to benefit from instruction, to attain mastery of language or content area. In the area of language testing, aptitude tests are commonly used to predict the success or failure of students prospective in a language-learning program.

Some other related issues

Norms vs. Criteria

A test which is designed to relate one candidate's performance on the test to that of other candidates is known as a norm-referenced test. On the other hand, there are tests that do not give this kind of information, but tell us something about what the candidate can actually do in the language. These tests are said to be criterion-referenced. The purpose of criterion tests is to classify people according to whether or not they are able to perform some task or set of tasks - such as exchanging information on personal background, making requests and suggestions, expressing attitudes, etc. - satisfactorily. The tasks are set, and the performances are evaluated. Those who perform them satisfactorily 'pass'; those who don't 'fail'. This means that students are encouraged to measure their progress in relation to meaningful criteria, without feeling that, because they are less able than most of their fellows, they are destined to fail.

Reliability

All tests are subject to inaccuracies. The ultimate scores received by the test-takers only provide approximate estimations of their true abilities. While some measurement error is unavoidable, it is possible to quantify and greatly minimize the presence of measurement error. A test on which the scores obtained are generally similar when it is administered to the same students with the same ability, but at a different time is said to be a reliable test. And since test reliability is related to test length, so that the longer tests tend to be more reliable than shorter tests, knowledge of the importance of the decision to be based on examination results can lead us to use tests with different numbers of test items.

Validity

What is the test being used for? Is it valid for its supposed purpose(s). A test is said to be valid if it measures accurately what it is intended to measure. Tests should have content validity, criterion-related validity and face validity. A language test is said to have content validity if it includes a proper sample of language skills, structures, etc. relevant to the purpose of the test. Criterion-related validity is identified as how far results of the test agree with those provided by some independent and highly dependable assessment of the candidate's ability. The independent assessment is thus the criterion measure against which the test is validated. A test also needs to look as if it measures what it is supposed to measure. Such a test is said to have face validity. Face validity is hardly a scientific concept, yet it is very important since the lack of it may make the test unacceptable to candidates, teachers, education authorities, etc.

Test design: Planning for classroom testing

Two basic principles foreign language teachers follow in the development of tests is to test what was taught and to test it in a manner that reflects the way in which it was taught. For example, if students spend 50% of their class time developing oral skills, then nearly half of their test should evaluate their oral skills. Current research in testing argues for a more direct connection between teaching and testing. The same kinds of activities designed for classroom interaction can serve as valid testing formats, with instruction and evaluation more closely integrated. As Oller points out, "Perhaps the essential insight of a quarter of a century of language testing (both research and practice) is that good teaching and good testing are, or ought to be, nearly indistinguishable". In addition, it is suggested that language teachers should make extensive use of formative testing that is integrated into the teaching and learning process.

Two check-lists for test-developers

(1) Characteristics of communicative tests:

test items create an "information gap," requiring test takers to process complementary information through the use of multiple sources of input, such as a tape recording and a reading selection on the same topic;
tasks are interdependent - that is, "tasks in one section of the test build . . . upon the content of earlier sections";
test tasks and content are integrated within a given domain of communicative interaction;
tests attempt to measure a broader range of cohesion, function and socio-linguistic appropriateness than did earlier tests, which tended to focus on the formal aspects of language - grammar, vocabulary, and pronunciation.

(2) General guidelines for designing a chapter or unit test:

review the objectives for the chapter/unit;
think of the contexts in which the language was used in this chapter or unit;
think of the linguistic functions that students have learned to perform;
think of the ways in which students have learned to interact with one another;
prepare an integrative test that reflects the types of activities done in class - what students learned to do orally should be tested orally, not with paper and pencil;
provide opportunities for students to use global language skills in a naturalistic context;
provide a model whenever possible to illustrate what students are to do;
provide instructions in the native language until you are certain that learners' ability to perform the task is not limited by a misunderstanding of the instructions;
develop a grading system that rewards both linguistic accuracy and creativity;
return graded tests promptly to show students their progress.

References

Brown, H. D. (1994). Teaching by Principles - An interactive approach to language pedagogy. New Jersey: Prentice Hall Regents.

Henning, G. (1987). A Guide to Language Testing. Boston, Massachusetts: Heinle & Heinle Publishers.

Hughes, A. (1990). Testing for Language Teachers. Cambridge: Cambridge University Press.

Shrum, J. & Glisan, E. (1994). Contextualized Language Instruction. Boston, Massachusetts: Heinle & Heinle Publishers.

HP-eLibrary

TOP

Types of Language Tests Related Issues Test design: Planning for classroom testing Two check-lists for test developers References

TESTING

Types of Language Tests

Related Issues

Test design: Planning for classroom testing

Two check-lists for test developers

References