|
TESTING
|
TESTING AND TEACHING
A Very Brief
Introduction
Tran Huy Phuong
ENGLISH DEPARTMENT
First, it is important to understand what
the difference is between testing and teaching. In some ways
the two are interwoven and interdependent that it is difficult
to tease them apart. Every instructional sequence, if it is of
any worth at all, has a testing component to it, whether the
tests themselves are formal or informal. That is, teachers
measure or judge learners' competence all the time and,
ideally, learners measure and judge themselves. Whenever a
student responds to a question or tries out a new word or
structure, you might be testing that student. Written work is
a test. Oral work is a test. Reading and writing performance
are tests. How, then, are testing and teaching different?
The difference lies in what we'll call
formal and informal testing. The above examples referred to informal
testing: unplanned assessments that are made as a course moves
along toward to its goals. Most informal testing is what
testing experts call formative evaluation: assessing
students in the process of 'forming' their competencies and
skills with the goal of helping them to continue that growth
process. Our success as teachers is greatly dependent on this
constant informal assessment for it tells us how well learners
are progressing toward goals and what the next step in the
learning process might be. Formal tests are exercises
or experiences specifically designed to tap into an extensive
storehouse of skills and knowledge, usually within a
relatively short time limit. They are systematic, planned
sampling techniques constructed to give teacher and student an
appraisal, as it were, of their achievement. Such tests are
often summative, as they occur at the end of a unit,
module, or course, and therefore attempt to measure, or
summarize, what a student he grasped.
Pedagogically, it is very important for you
to make the distinction between teaching and formal testing,
especially from the point of view of principles of intrinsic
motivation. For optimal learning to take place, students must
have the freedom in the classroom to experiment, to try things
out, to 'test' their own hypotheses about language without
feeling that their overall competence is being 'judged' in
terms of these trials and errors. In the same way that, say,
tournament tennis players must have the freedom to practice
their skills - with no implications for their final placement
- before the tournament itself begins, so also must your
learners have ample opportunities to 'play' with language in
your classroom without being formally graded.
Testing, then, sets up the practice games
of language learning, the opportunities for learners to listen
and think and take risks and set goals and process feedback
and cycle and recycle through whatever it is that they are
trying to set in place. While we teach, formal testing places
a different set of expectations on students. Formal tests are
the tournament games, or the 'recitals' that periodically
occur in the learning process.
The effect of testing on teaching and
learning is known as backwash. Backwash can be harmful
or beneficial. If a test is regarded as important, then
preparation for it can come to dominate all teaching, and
learning activities. And if the test content and testing
techniques are at variance with the objectives of the course,
then there is likely to be harmful backwash.
However, backwash need not always be
harmful; indeed it can be positively beneficial. A test is
likely to bring about beneficial backwash if it is designed
based directly on an analysis of the English language needs of
the students and includes tasks as similar as possible to
those they will have to perform in their future use of the
language. The relationship between teaching and testing is
surely that of partnership. It is true that there may be
occasions when the teaching is good and appropriate and the
testing is not; we are then likely to suffer from harmful
backwash. But equally there may be occasions when teaching is
poor or inappropriate and when testing is able to exert a
beneficial influence - proper changes such as the syllabus
redesigned, new teaching techniques chosen, classes conducted
differently, etc. We cannot expect testing only to follow
teaching. What we should demand of it, however, is that it
should be supportive of good teaching and, where necessary,
exert a corrective influence on bad teaching. If testing
always had a beneficial backwash on teaching, it would have a
much better reputation amongst teachers.
As described above, in broad categories
testing can be classified as either summative or formative.
Summative testing often occurs at the end of a course, and it
is designed to evaluate the sum total of what has been taught
and learned. There are usually no opportunities for further
input or performance. The most common example of a summative
test is a final exam. Formative tests, on the other hand, are
designed to help "form" or shape the learners'
ongoing understanding or skills while the teacher and learners
still have opportunities to interact for the purposes of
repair and improvement. Examples include quizzes (5-15
minutes), class interaction activities such as paired
interviews, and chapter or unit tests. Unfortunately, however,
even tests that are usually formative, such as quizzes, become
summative instruments when they are used simply as
opportunities to put grades in a gradebook to be averaged at
the end of a given term. A sufficient amount of formative
testing must be done in the classroom in order to enable
students to revisit and review the material in a variety of
ways.
Research on foreign language testing has
distinguished between two different testing formats: (1)
discrete point tests and (2) integrative or global tests. The
contrast between these two formats often reflects different
teaching philosophies. A discrete point test focuses on
one linguistic component at a time: grammar, vocabulary,
syntax, or phonology. Test items include a variety of formats,
such as multiple-choice, true-false, matching, and completion,
in which the language is segmented into parts for individual
analysis. Discrete point tests have traditionally featured
unconnected sentences lacking in meaningful or natural
contexts. These tests have also tended to assess one skill at
a time, such as listening, reading, or writing. Unlike
discrete point tests, integrative or global tests
assess the student's ability to use various components of the
language at the same time, and often multiple skills as well.
For example, an integrative test might ask students to listen
to a taped segment, identify main ideas, and then use the
information as the topic for discussion, as the theme of a
composition, or to compare the segment to a reading on the
same topic.
Another important distinction that should
be made in testing is that between objective and subjective
tests. Usually these two types of tests are distinguished
on the basis of the manner in which they are scored. An
objective test is said to be one that may be scored by
comparing examinee responses with an established set of
acceptable responses or scoring key. No particular
knowledge or training in the examined content area is required
on the part of the scorer. A common example would be a
multiple-choice recognition test. Conversely a subjective test
is said to require scoring by opinionated judgment, hopefully
based on insight and expertise, on the part of the scorer. An
example might be the scoring of free, written compositions for
the presence of creativity in a situation where no operational
definitions of creativity are provided and where there is only
one rater. Many tests, such as cloze tests permitting all
grammatically acceptable responses to systematic deletions
from a context, lie somewhere between the extremes of
objectivity and subjectivity. So-called subjective tests such
as free compositions are frequently objectified in
scoring through the use of precise rating schedules
clearly specifying the kinds of errors to be quantified, or
through the use of multiple independent raters.
Types of language
tests
On the ground of their purposes, language
tests can be classified into five types:
Proficiency
Proficiency tests are designed to measure people's
ability in a language regardless of any training they may have
had in that language. The content of a proficiency test,
therefore, is not based on the content or objectives of
language courses which people taking the test may have
followed. Rather, it is based on a specification of what
candidates have to be able to do in the language in order to
be considered proficient or having sufficient command of the
language for a particular purpose.
Achievement tests
Most teachers are unlikely to be responsible for
proficiency tests. It is much more probable that they will be
involved in the preparation and use of achievement tests. In
contrast to proficiency tests, achievement tests are directly
related to language courses, their purpose being to establish
how successful individual students, groups of students, or the
courses themselves have been in achieving objectives. They are
of two kinds: final achievement tests and progress
achievement tests.
Final achievement tests are those
administered at the end of a course of study. Clearly the
content of these tests must be related to the courses with
which they are concerned. Some language testers argue that the
content of a final achievement test should be based directly
on a detailed course syllabus or on the books and other
materials used. Since the test only contains what it is
thought that the students have actually encountered, and thus
can be considered a fair test. The disadvantage is that if the
syllabus is badly designed, or the books and other materials
are badly chosen, then the results of the test can be very
misleading. Successful performance on the test may not truly
indicate successful achievement of the course objectives.
The alternative is to base the test content
directly on the objectives of the course. This has a number of
advantages. It forces course designers to be explicit about
objectives. Secondly, it makes it possible for performance on
the test to show just how far students have achieve those
objectives. In turn, this puts pressure on those responsible
for the syllabus and for the selection of books and materials
to ensure that these are consistent with the course
objectives. Tests based on objectives work against the
perpetuation of poor teaching practice. This will provide more
accurate information about individual and group achievement,
and it is likely to promote a more beneficial backwash effect
on teaching.
Progress achievement tests, as their name
suggests, are intended to measure the progress that students
are making. Since 'progress' is towards the achievement of
course objectives, these tests too should relate to
objectives.
Diagnostic tests
Diagnostic tests are used to identify student's strengths
and weaknesses. They are intended primarily to ascertain what
further teaching is necessary. Indeed existing proficiency
tests may often prove adequate for this purpose.
Placement tests
Placement tests, as their name suggests, are intended to
provide information which will help to place students at the
stage (or in the part) of the teaching program most
appropriate to their abilities. Typically they are used to
assign students to classes at different levels. The placement
tests which are most successful are those constructed for
particular situations. They depend on the identification of
the key features at different levels of teaching in the
institution.
Aptitude tests
These tests are designed to assist in the decision of who
should be allowed to participate in a particular course. This
becomes a matter of serious concern when there are more
applicants than spaces available. Such selection decisions are
often made by determining who is most likely to benefit from
instruction, to attain mastery of language or content area. In
the area of language testing, aptitude tests are commonly used
to predict the success or failure of students prospective in a
language-learning program.
Some other related
issues
Norms vs. Criteria
A test which is designed to relate one
candidate's performance on the test to that of other
candidates is known as a norm-referenced test. On the
other hand, there are tests that do not give this kind of
information, but tell us something about what the candidate
can actually do in the language. These tests are said to be criterion-referenced.
The purpose of criterion tests is to classify people according
to whether or not they are able to perform some task or set of
tasks - such as exchanging information on personal background,
making requests and suggestions, expressing attitudes, etc. -
satisfactorily. The tasks are set, and the performances are
evaluated. Those who perform them satisfactorily 'pass'; those
who don't 'fail'. This means that students are encouraged to
measure their progress in relation to meaningful criteria,
without feeling that, because they are less able than most of
their fellows, they are destined to fail.
Reliability
All tests are subject to inaccuracies. The
ultimate scores received by the test-takers only provide approximate
estimations of their true abilities. While some measurement
error is unavoidable, it is possible to quantify and greatly
minimize the presence of measurement error. A test on which
the scores obtained are generally similar when it is
administered to the same students with the same ability, but
at a different time is said to be a reliable test. And since
test reliability is related to test length, so that the longer
tests tend to be more reliable than shorter tests, knowledge
of the importance of the decision to be based on examination
results can lead us to use tests with different numbers of
test items.
Validity
What is the test being used for? Is it
valid for its supposed purpose(s). A test is said to be valid
if it measures accurately what it is intended to measure.
Tests should have content validity, criterion-related validity
and face validity. A language test is said to have content
validity if it includes a proper sample of language
skills, structures, etc. relevant to the purpose of the test. Criterion-related
validity is identified as how far results of the test
agree with those provided by some independent and highly
dependable assessment of the candidate's ability. The
independent assessment is thus the criterion measure against
which the test is validated. A test also needs to look as if
it measures what it is supposed to measure. Such a test is
said to have face validity. Face validity is hardly a
scientific concept, yet it is very important since the lack of
it may make the test unacceptable to candidates, teachers,
education authorities, etc.
Test design: Planning
for classroom testing
Two basic principles foreign language
teachers follow in the development of tests is to test what
was taught and to test it in a manner that reflects the way in
which it was taught. For example, if students spend 50% of
their class time developing oral skills, then nearly half of
their test should evaluate their oral skills. Current research
in testing argues for a more direct connection between
teaching and testing. The same kinds of activities designed
for classroom interaction can serve as valid testing formats,
with instruction and evaluation more closely integrated. As
Oller points out, "Perhaps the essential insight of a
quarter of a century of language testing (both research and
practice) is that good teaching and good testing are, or ought
to be, nearly indistinguishable". In addition, it is
suggested that language teachers should make extensive use of
formative testing that is integrated into the teaching and
learning process.
Two check-lists for
test-developers
(1)
Characteristics of communicative tests:
- test items create an "information gap,"
requiring test takers to process complementary
information through the use of multiple sources of
input, such as a tape recording and a reading selection
on the same topic;
- tasks are interdependent - that is, "tasks in one
section of the test build . . . upon the content of
earlier sections";
- test tasks and content are integrated within a given
domain of communicative interaction;
- tests attempt to measure a broader range of cohesion,
function and socio-linguistic appropriateness than did
earlier tests, which tended to focus on the formal
aspects of language - grammar, vocabulary, and
pronunciation.
(2) General
guidelines for designing a chapter or unit test:
- review the objectives for the chapter/unit;
- think of the contexts in which the language was used
in this chapter or unit;
- think of the linguistic functions that students have
learned to perform;
- think of the ways in which students have learned to
interact with one another;
- prepare an integrative test that reflects the types of
activities done in class - what students learned to do
orally should be tested orally, not with paper and
pencil;
- provide opportunities for students to use global
language skills in a naturalistic context;
- provide a model whenever possible to illustrate what
students are to do;
- provide instructions in the native language until you
are certain that learners' ability to perform the task
is not limited by a misunderstanding of the
instructions;
- develop a grading system that rewards both linguistic
accuracy and creativity;
- return graded tests promptly to show students their
progress.
References
Brown, H. D. (1994). Teaching by Principles - An
interactive approach to language pedagogy. New Jersey:
Prentice Hall Regents.
Henning, G. (1987). A Guide to Language Testing.
Boston, Massachusetts: Heinle & Heinle Publishers.
Hughes, A. (1990). Testing for Language Teachers.
Cambridge: Cambridge University Press.
Shrum, J. & Glisan, E. (1994). Contextualized
Language Instruction. Boston, Massachusetts: Heinle
& Heinle Publishers.
|
|
|
|