Educational Technology

Evaluation and Tests

Evaluation is integral part of the education. It is for measurement of selected knowledge, skills, attitudes and even values for finding behavioral change in the students. Evaluation is also an aid in teaching. The teachers need to use evaluation in improving their teaching. The failure of student to progress reflects the failures of the teacher in teaching them to some extent. The evaluation of students’ achievements could be used to select better methods of teaching and evaluation. Thus evaluation has two purposes- i). self appraisal by the student , ii) self appraisal by teacher.
As Wrightstone mentions, evaluation emphasizes upon broad personality changes and involves three steps:

  1. Identifying and formulating objectives
  2. Defining these objectives in terms of behaviour to be realized by the student
  3. Selecting valid, reliable sand practical instruments.

Modern concept of evaluation in education denotes at least three different dimensions of evaluation.

  1. The evaluation attempts to measure a comprehensive range of behavioral objectives rather than mere knowledge of subject matter
  2. A variety of evaluative instruments are used depending upon the availability and applicability of the instruments and the skill of the teacher using them. Some of these instruments are tests, essays and questionnaires and the interviews.
  3. Evaluation includes integrating and interpreting various aspects of behaviour into a whole or into a inclusive picture of a student or of a class of students as may be required.

In educational measurement and evaluation there are certain important terms to be understood, when studying the course for effective understanding as well as mastery of relevant concepts. The following are the terms:

Measurement: Process of quantifying individual’s achievement, personality, attitudes, habits and skills or Process by which information about the attributes or characteristics of things are determined and differentiated.

Evaluation: Qualitative aspect of determining the outcomes of learning. Process of ranking with respect to attributes or trait.

Assessment: is a process by which information is gained relative to some known objective or goal.

However, evaluation should not be confused with measurement. The term measurement implies obtaining quantitative evidence, whereas the term evaluation implies that consideration has also been given to certain value standards.

The major purpose of educational evaluation are,

  1. To check the effectiveness of teaching
  2. To check the effectiveness of the institution and system as a whole
  3. To assess the progress of the student
  4. To select the students for higher courses, specializations in UG & PG and for jobs.
  5. To indicate points and levels of improvement
  6. To validate and verify the hypotheses upon which the teacher base their teaching and evaluation of students
  7. To guide students in removing their weaknesses as reflected in the evaluation
  8. To provide certain Psychological security to the school staff, students, their parents and the community at large.
  9. To help both teachers and students to see their objectives
  10. To help teacher improve, or change, their methods of teaching and evaluation.

Tests construction and types

Test construction
Generally speaking, a test is an instrument or systematic procedure for measuring a sample of behavior. A test or examination is an assessment intended to measure a test-taker’s knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs). A test may be administered verbally, on paper, on a computer, or in a confined area that requires a test taker to physically perform a set of skills. Tests vary in style, rigor and requirements. In the context of education, tests are constructed with questions

Tests should be constructed and administered in such a way that the scores (marks) yield reflect the ability they are supposed to measure. The type of test to be constructed depends on the nature of the ability its meant to measure and purpose of the test. Certain types of educational tests can only be constructed by teams of suitably qualified and equipped researchers. The process of test construction s long and painstaking for it involves creating large batteries of test questions in the particular area to be examined followed by extensive trials in order to assess their effectiveness.

In this way, questions are eliminated which:

  • Do not discriminate or distinguish between students whose abilities are different
  • Are frequently misunderstood by students.
  • Have more than one correct answer.
  • Give an advantage to certain students on the basis of factors other than those being tested.
  • An ordinary teacher can help his pupils by using the different types of tests we have for the particular purpose for which they are designed.

The teacher therefore needs to construct test that tell him:

  • What the pupils have learned from his or her teaching.
  • How well they can perform the practical skills he has taught them.
  • Whether they understand the underlying principles of what they are learning.
  • How quickly and accurately they can work.
  • How well they can apply what they know to problems they meet.
  • If they have yet developed the intellectual skills that older children can perform such as the ability to analyze, deduce, compare, and evaluate.

Types of tests

1. Objective tests: They are also called as selective type, as the student has to select the answer. The preparation of good selection-type items is difficult and students can get a proportion of answers correct by guessing. They include the following:

  • True- false items
  • Matching items
  • Multiple choice items
  • Completion items

This is a test consisting of factual questions requiring extremely short answers that can be quickly and unambiguously scored by anyone with an answer key. They are tests that call for short answer which may consist of one word, a phrase or a sentence.

2. Subjective type: Also termed as supply-type items, include extended response items and restricted response items. Extended response items require lengthy responses that count heavily in scoring. These items focus on major concepts of the content unit and demand higher level thinking. On restricted response items examinees provide brief answers, usually no more than a few words or sentences, to fairly structured questions.
These items are easier to construct but more difficult to score. This is a type of test that is evaluated by giving opinion. They are more challenging and expensive to prepare, administer and evaluate correctly, though they can be more valid. They include Short answer and Essay questions.

Examinees must organize multiple ideas and provide supporting information for major points in crafting responses.

Advantages of restricted response items
a. Measures specific learning outcome.
b. Restricted response items provide for more ease of assessment
c. Restricted response item is more structured
d. Any outcomes measured by an objective interpretive exercise can be measured by a restricted subjective item.

Limitations of restricted response items
a. Restricts the scope of the topic to be discussed and indicating the nature of the desired response which limits students opportunity to demonstrate this behavior.

Advantages of Extended response items
I. Measures knowledge at higher cognitive levels of education objective such as analysis, synthesis and evaluation.
II. They expose the individual difference in terms of attitudes, values and creative thinking.

Limitations
i. They are insufficient for measuring knowledge of factual materials because they call for extensive details in selected content area at a time.
ii. Scoring is difficult and unreliable

Instructions to write objective questions

a. True-False items
Do not provide clues by using determinants such as ‘all’, ‘never’, ‘absolutely’ or ‘none’ because they signal that the statement is false. Words such as ‘may’, perhaps’, sometimes and ‘could’ signal that the statement is true. If such words are to be used, they must be balanced and used in both true and false statements. Statements must be irrevocably true or false, so they must be unambiguous (clear). Use of negative statements should be avoided. Limit true or false statement to a single concept. True or false tests items may require the learner to underline a word or clause in a statement, correct a false statement or trace a path in a maze.

Tips

  • each statement is unequivocally judged true or false
  • the statement is brief and stated in simple, clear language
  • negative statements are used sparingly and double negatives are avoided
  • the statements are free of clues to the answer (e.g. verbal clues, length)
  • there is approximately an equal number of true and false statements
  • the true and false items are arranged in random order

b. Matching items
These items involve connecting contents of one list to contents in another list. The learners are presented with two columns of items, for instance, column A and column B. they are asked to match each item that appears in column A with an appropriate item from column B. in such questions, an equal number of premises (what is in the left hand column) may be provided for balance or perfect matching when an unequal number of premises and responses are provided, this is called an unbalanced or imperfect keep matching. To control guess work, it is better to have more responses and fewer premises.

Tips

  • the items are based on homogeneous material
  • the instructions clearly state the basis for matching and that each response can be used once, more than once, or not at
  • all  the items appear on the same page
  • an uneven match is provided by making the list of responses longer or shorter  than the list of premises.

c. Multiple-Choice
It has three components.

  • Stem: A question or statement followed by a number of choices or alternatives that answer or complete the question or statement.
  • Alternatives: All the possible choices or responses to the stem.
  • Distracters (foils): Incorrect alternatives.

For perfection in preparing multiple choice, the teacher can draw a table of specification showing topics or subtopics and the skills to be tested. The table of specification come from the subject syllabus. The test items should be based on the three domains of learning ( cognitive, affective, psychomotor). The area emphasized during teaching should have more items. Questions should be based on bloom’s taxonomy- of the six levels of cognitive objectives multiple choice questions should reflect comprehension, application and analysis. There should be minor doses of knowledge, synthesis and evaluation. Knowledge is too basic while synthesis is too complete.
Allocation of marks for these skills can be as follows:

Knowledge-12%
Comprehension-16%
Application-32%
Analysis-20%
Synthesis-12%
Evaluation-8%
Total=100%

The stem of the question should state the problem clearly. It should not contain unnecessary information. Options should be carefully selected and must include the best answer or key.

Each question should be relevant and not far- fetched. All options should be almost equal in length. The distractors should be relevant and not far-fetched. Placement of the key should be unpredictable and should not follow a pattern. No test or option test should provide clues or be answers to another question in the same test. The reading difficult and vocabulary level of items should correspond to the level of the learners .All items should be independent. Avoid tricky questions. Ensure instructions to learners are clear. Edit the paper carefully.

Tips

  • the stem  of the item present a single, clearly formulated problem
  • the stem is stated in simple, clear language
  • the stem is worded so that there is no repetition of material in the alternatives
  • the stem is stated in positive form wherever possible
  • if negative wording is used in the stem, it is emphasized in bold or by underlining
  • the intended answer is correct or clearly best
  • all alternatives are grammatically consistent with the stem and parallel in form
  • the alternatives are free from verbal clues to the correct answer
  • the distracters are plausible and attractive to the uninformed
  • to eliminate length as a clue, the relative length of the correct answer is varied
  • the alternative “all of the above” or “none of the above” are used only when appropriate

Completion or short answer test items
In this, learners are required to supply the words or figures which have been left out.
They may be presented in the form of questions or phrases in which a learner is required to respond with a word or several statements. Questions must be specific and unambiguous. Besides this, statements that leave too many key words may not carry the intended meaning. If the answer is numerical or a quantity the unit must be indicated. The answer required should be related to the main point or statement. In constructing completion items, the blank should come last to ensure that the learners read the whole question before supplying the answer. Unintentional help should not be given in the question

Subjective tests
Two types of responses are obtained under subjective tests- Restricted and extended. By the title it is clear that restricted are short answer questions and extended are lengthy answers. Certain precautions are to be observed while writing these types of questions.

  • Developing the prompt
    The prompt for a subjective item poses a question, presents a problem, or prescribes a task. It sets forth a set of circumstances to provide a common context for framing the response. Action verbs direct the examinee to focus on the desired behavior, for instance, solve, interpret, compare and contrast, discuss or explain. Appropriate directions indicate expected length format of the response, allowable resources or equipment’s, time limits and features of the response that count in scoring.
  • Creating the scoring rubric
    These are analytic or holistic in nature. For holistic rubric the item writer/ constructor lists desired features of the response with a number of points awarded for each specific feature.bAn analytic rubric provides a scale for assigning points to the response based on overall impression. A range of possible points is specified and verbal descriptors are developed to characterize a response located at each possible point on the scale. Illustrative responses that correspond to each scale point are often developed or selected from actual examinee responses.
  • Scoring response
    During subjective scoring at least four types of rater errors may occur as the rater; becomes more lenient or severe over time or scores erratically due to fatigue or distractions; has knowledge or belief about an examinee that influences perception of response; is influenced by examinees good or poor performance on items previously or influenced by the strength or weakness of a preceding examinees response.

Under extended response items we can take an example of the essay test items look on how it is constructed:

Essay items require learners to write or type the answer in a number of paragraphs. The learners use their own words and organize the information or material as they see it fit.

In writing essay test, clear and unambiguous language should be used. Words such as ‘how’, ‘why’, ‘contrast’, ‘describe’ and discuss are useful. The questions should clearly define the scope of the answer required.

The time provided for the learner to respond to the questions should be sufficient for the amount of writing required for a satisfactory response. The validity of questions can be enhanced by ensuring that the questions correspond closely to the goals or objective being tested. An indication of the length of the answer required should be given.

Tips
Short-Answer items

  • the item calls for a single, brief answer
  • the item has been written as a direct question or a well-stated incomplete sentence
  • the desired response is related to the main point of the item
  • clues to the answer have been avoided (e.g. “a” or “an”, length of the blank.
  • the units and degree of precision is indicated for numerical answers.

Essay questions

  • questions starting questions with “who”, “what”, “when”, “where”, “name”, “list” are avoided as these terms limit the response
  • questions demanding higher order skills, such as those indicated in the following table are used

Oral Tests: Oral tests require skilful questioning on the part of the teacher. The thought processes of the students are im¬mediately detected in an oral test. The test can be taken in a relax¬ed state of mind by an extrovert; whereas the same technique’ can be ftightening to a shy and withdrawn student. An oral test can save the time and cost in correcting it but it will take considerable time in administering because it is to be administered individually. The writing skills cannot be evaluated in an oral test. To test language proficiency in speaking and to follow the thought process of the students as they speak, the oral tests are useful.

Oral tests, however, have the limitation of being highly subjec¬tive. The teacher’s questions, or his way of asking questions, can vary from student to student, and thus objectivity in evalua¬tion is lessened. The answers cannot be compared among one student and another. Bias of the teacher can, therefore, easily influence the grading of answers and final evaluation.

Oral examination should be confined to asking simple questions which can be answered quickly. The key of oral tests, or examinations, should be prepared beforehand so that as the test is being taken by the students, their answers can be compared with the key.

Example for oral question
These oral questions are samples of tests given to college student::. who took a course in organization of secondary schools

  1. Name three characteristics of effective teachers in relation to their teaching.
  2. Give the principle of planning educational programmes
    which is related to the people’s problems and needs.
  3. Name two roles of a good educational administrator.
  4. Why teaching is considered a profession.
  5. In which three terms, the space organization in secondary schools can be described?

Key to Oral Questions

  1. (a) Competence in teaching. (b) Healthy relationship (rapport) with students. (c) Mastery over the subject.
  2. Educational programme planning is based on, and grows out of, the recognized problems and needs of the people.
  3. (a) Decision maker. (b) Leader.
  4. Because it is characterized by a long period of specialized formal training.
  5. A space equipped for specific activity and organized in terms of location, sanitation and usefulness.

Observation. The evaluative technique of observation requires the teacher to be an effective observer and recorder. Observations may be made of the performance of the students, their experiences and their problems. Observation, as an evaluative technique, is reliable when aspects to be observed are carefully selected, and systematic methods of observing and recording them are followed. Informal observations of students’ behaviour can contribute to the total assessment of students, but such observations cannot be regularly used for grading as the only technique. Observation guides, or schedules, should be prepared beforehand, and time-limits for systematic observation should be set in advance.
Observation can be used for measuring emotional and social adjustments. Only when the behaviour can be expressed overtly and, therefore, can be observed, the technique of observation should be used. In home science, for example, behaviours of students can often be overtly expressed and, therefore, can be observed. The students can be systematically observed in classrooms, laboratories, nursery schools, and home management houses.

One important condition of using observation as an evaluative technique is that every student must be observed under similar conditions. The teacher will, therefore, have to create. situations which are fairly controlled so as to permit uniformity of conditions of observation. The teacher must-be careful not to misinterpret I what she has observed, and observations should be preferably recorded right on the spot when they are being made, or immediately thereafter. It is quite likely that the teachers’ prejudices, values and attitudes, would influence their records of observations, and, therefore, the teacher needs to develop objectivity in recording observations. Firstly, the observations should be recorded as objectively as possible; and, secondly they should be used with other evidences for evaluation of students.
Example of Obervation Sheet for Observing Student- Demonstration

Directions: Rate each point by placing a score in the column to the right. Use the number that best describes the student-demonstrator

1. Was the purpose of the demonstration clear
2. Was the demonstrator able to go through the demonstration with competence.
3. Were the steps in the process presented clearly, accurately, and in a connective order?
4. Were the essential points given proper emphasis?
5. Did the explanation of the steps in the process accompany the doing of them?
6. Were the equipment and supplies organized so that the work could be done in an effective manner?
7. Were all necessary materials prepared in advance?
8. Were the students able to see and hear during the demonstra¬tion?
9. Did the demonstration include a summary of the important points?
10. Was the speed of the demonstrator appropriate to ensure clarity and learning?

Rating Scales. The rating scales are used for evaluating characteristics of students which may be present in varying degrees. These scales describe the behaviour of students under several situations, and with varying degrees of traits. The teacher must compare the behaviour of students with the descriptions on the rating scale instead of judging her on the basis of performance; in other words, the teacher should guard herself against the ‘halo effect’ in using a rating scale.

“A rating scale is a device for systematically recording observer’s judgements concerning the degree to which a quality or trait is present.” The use of rating scale as an evalua¬tive device is mo~t useful in areas where evalution is based upon ‘ observational methods. Social behaviour, teaching competence, language competence and manipulative skills can be best measured by using rating scales. Specifically, the rating scales are used to improve teaching, to estimate worth, efficiency or quality of success, to study and prevent failures, to weigh qualities of success, to furnish a standard, to report progress, and to evaluate procedure, product and personal-social development

In doing aU these, the teacher is basically evaluating proced¬ures, products, or personal-social development of students. Examples of these three kinds of evaluations usil1-g ratiilgscales, are as follows:

Procedure Evaluation. Iri. many areas of learning, achievement is expressed directly in the performance of students. Typical examples of performance which can be studied through procedure evaluation include the ability to give a speech, to manipulate laboratory equipment, to work efficiently in a group, to cook, or to playa musical instrument. To judge the progress in such activi¬ties, the procedures used in the performance itself must be observed and rated.

Example of Rating Scale for Procedure Evaluation

Speech Rating Scale
Directions: Rate the pupil’s speaking ability by placing a cross( x) . anywhere along the horizontal line, under each characteristic
Product Evaluation. When performances of students result in some type of product which can be observed, it is frequently more desirable to judge the product rather than the procedure. The ability to write an essay, or to make a garment, is best evaluated by judging the essay, or the garment. In some areas of learning, such as typing, or cooking, it might be more desirable to rate procedures during the early phases of learning, and to evaluate pro¬ducts later, after the skills have been mastered. Product scales can be used in judging the quality of any product, but in most areas, teachers will need lo develop their own scales.

Personal-Social Development Evaluation. Ratings in the area of personal-social development are typically obtained at periodic intervals and represent a kind of summing-up of the general impressions a teacher has formed about her students. The ratings are based on observation which tend to be casual and spread , over an e~tended period of time. Such rating scales tend to be highly subjective as they reflect more of the teacher’s feelings and personal biases than the extent of the attributes evaluated. However, an important dimension of personal-social development-the type of impression a person makes upon others¬can be measured by such scales.
Many types of rating scales have been developed. Because of many variations in form, content, and methods of construction, the classification of rating scales is difficult. Only three major types are discussed here:

Numerical Rating Scales. The numerical rating scales are the simplest types of rating scales. Raters check, or circle, a number to indicate the degree to which a characteristic is present. At times, numbers on the rating scales are given a verbal description which remains constant from one characteristic to another. In some cases, the rater is merely told that the largest number is the highest, ‘1’ is the lowest, and other numbers represent intermediate values.
Directions: Indicate the degree to which the pupil contributes to class discussion, by encircling the appropriate number The numbers represent the following values:
5-0utstanding 3-Average 1-Unsatisfactory

  1. To what extent does the pupil participate in discussion?
  2. To what extent are the comments related to the topic under discussion?

Graphic Rating Scales. The distinguishing feature of the graphic rating scale is that each characteristic is followed by a horizontal line. The rating is made by placing a check on the line. Typically, a set of categories identifies specific positions along the line, but the rater is free to check between these points. When some sets of categories are used for each characteristic, the scale is called Constant-Alternative Scale: when the categories vary, it is called Changing-Alternative Scale

Descriptive Graphic Scale. The descriptive graphic scale uses deSCrIptive phrases to identify the points on a graphic scale. The descriptions are brief, and convey in behavio~ral terms, what students are like at different steps along the scale. In some scales onl~ ~he centre and the end positions are described. In others, all posItl.ons, or the degrees of characteristics under question, are d~scnbed. A space for comments may also be provided. It is dIfficult to make this kind of scales, but they describe’ the student behaviour in greater detail. It may also be difficult to observe all the traits, or even to evaluate all these traits. It would, therefore be necessary to include such categories as ‘UNCERTAIN’ or ‘NO OPPORTUNITY TO OBSERVE’ in the scale. If such comments are made, the accuracy of the scale will increase

Directions: Make your ratings on each of the following characteri¬stics by placing an ‘X’ anywhere along the horizontal line, under each item. Give comments which help clarify your rating, in the space indicated

Construction and Use of Rating Scales. The rating scales are very easy to use, but the errors in their use are as easy to commit. More than one person should, therefore, use the rating scales which are for eyaluating students. Most of the teachers try to avoid using the extreme scores, or values; and keep using only the middle ones, which may produce a ‘halo effect’.

For making rating scales, the teacher must first prepare a list of behaviours to be evaluated, and get it scrutinized by other teachers who teach related courses. Opinions of experts in evaluation should also be obtained for the soundness in construc¬tion of the rating scale. The students, also, may be consulted if they have been taught about the fise and construction of rating scales.

The course outline is the important source of material for making rating scales. The teacher should rely heavily on the course outlines for selecting the items on their rating scale.

Too many items, overlapping items of rare occurrence should be avoided. Such items make evalua’tion inaccurate and difficult to rate. Adequate description, or identification of each behavior to be rated, is essential. Only observable behaviour of students should be evaluated by using the rating scales. Evaluation of products and procedures may be relatively easier than evaluation’ of social-personal relationships.

References

  1. Coffman, W. E. (1971). Essay examination. In R. L. Thorndike (Ed.), Educational
    measurement (2nd ed., pp. 271–302). Washington DC: American Council on Education.
  2. Welch C. J. (2006). Item and prompt development in performance testing. In S. M. Downing & T. M. Haladyna (Eds.). Handbook of test development (pp. 303–327). Mahwah, NJ: Erlbaum.
  3. Gronlund, N. E., & Linn, R. L. (1995). Measurement and evaluation in teaching. Englewood Cliffs, NJ: Merrill.

Assignment

  1. Prepare objective and subjective test for any selected course duly following the directions.
  2. Select any multiple choice question paper and check for following categorization. Comment based on the recommended proportion.
    S.NoQuestionKnowledgeComprehensionApplicationAnalysisSynthesisEvaluation
    Total
    Percentage
    Recommended 12%16%32%20%12%8%