Criterion: Research on Using Criterion in your Classes

We are fortunate that one of our colleagues did a study of using Criterion in her Engineering classes, Dr. Backer's white paper Effectiveness of an Online Writing System in improving Students’ Writing Skills in Engineering is linked for download here, Criterion CoED Submission Draft 2 BW Backer.pdf Download Criterion CoED Submission Draft 2 BW Backer.pdf, my excerpts are below.

Basic Overview

When students receive their diagnostic feedback in Criterion, they get a link which opens the specific section related to the marked error.

The five scoring categories under Trait Feedback Analysis are:

Grammar score – based on errors such as those in subject-verb agreement among others
Mechanics score – derived from errors in spelling and other like errors
Usage score – based on such errors as article errors and confused words (an example would be an instance in which the essay writer uses a word that although phonetically similar has a different meaning from the intended word; using "to" where it would have been proper to use "too")
Style score – based on instances of overly repeated words and the number of very long or very short sentences as well as other such features
Organization/development score – based on the identification of sentences that correspond to the background, thesis, main idea, supporting idea, and conclusion

Criterion will indicate potential errors under Grammar in nine areas: Fragment or Missing Comma, Run-on Sentences, Garbled Sentences, Subject-Verb Agreement, Ill-Formed Verbs, Pronoun Errors, Possessive Errors, Wrong or Missing Word, and a special category called Proofread This!

Criterion will indicate grammatical usage errors in eight subareas: Wrong Article, Missing or Extra Article, Confused Words, Wrong Form of Word, Faulty Comparisons, Preposition Error, Nonstandard Word Form, and Negation Error.

The subareas included in Mechanics are: Spelling, Capitalize Proper Nouns, Missing Initial Capital Letter in a Sentence, Missing Question Mark, Missing Final Punctuation, Missing Apostrophe, Missing Comma, Hyphen Error, Fused Words, Compound Words, and Duplicates.

Issues

Non-fixable spelling errors. If students cited an author in their papers or included very technical words, Criterion sometimes would indicate an error. As these Criterion “errors” did not have to be fixed, students were not penalized for them.

For this study, the last two Criterion categories (Style and Organization & Development) were not assessed. The Style category gives students feedback in six areas: Repetition of Words, Inappropriate Words or Phrases, Sentences Beginning with Coordinating Conjunctions, Too Many Short Sentences, Too Many Long Sentences, and Passive Voice.

The Organization and Development category gives students feedback in eight subareas: Introductory Material, Thesis Statement, Topic Relationship & Technical Quality, Main Ideas, Supporting Ideas, Conclusion, Transitional Words and Phrases, and Other (see Figure 7). The Organization & Development category is based on the assumption that the student will write a standard, short, five-paragraph essay.

If the essay is designed to be under 1,000 words, the students can get a holistic score (from 1 to 6 with 6 being the highest score). As the requirements of the Tech 198 assignments were to write essays of at least 1,250 words, the instructor used the other feedback feature of Criterion, Trait Feedback Analysis. For this analysis, the instructor assessed the students on Grammar, Usage, and Mechanics.

Summary

Due to the history of low writing skills in the Tech 198: Technology and Civilization course at SJSU, the Fall 2012 instructor in Tech 198 piloted ETS Criterion Online Writing Evaluation Service in the class for the two research papers required. The hypothesis was that the use of ETS Criterion would improve students’ writing in the class, therefore reducing the amount of time required to grade the students’ research papers. Overall, this research shows that adopting Criterion has better served students who previously had difficulties writing. The unique tools that Criterion offers allows students to receive real-time feedback on their submitted work, get access to detailed descriptions of their mistakes, and revise their essays in a timely manner; thereby, improving the efficiency of the instructor and the confidence and writing capacity of the student.

This is evidenced at San Jose State University in the comparison between two sections of Tech 198: Technology and Civilization’s spring and fall 2012 classes. Based upon the data collected, it can be said that, with the introduction of Criterion in fall, students reduced the number of grammatical errors on their assignments and increased their grades on their research papers, compared to the Spring 2012 semester. Given the positive results the instructors and advised the General Education committee to continue the use of Criterion in the Tech 198 course, as well as extend it to other SJSU classes.

Other Evaluations of Criterion

Language Learning and Technology
June 2012 16(2) 38-45
Review by Hyojung Lim and Jimin Kahng

Get full pdf here: http://llt.msu.edu/issues/june2012/review4.pdf Links to an external site.

Excerpts:

Limitations

E-rater [E-rater is an automated essay scoring system that Criterion uses to evaluate submitted essays. The system is based on natural language processing to extract features from essays and to predict, statistically, what human raters would assign as holistic scores] has several limitations. The system does not cover all the constructs of writing. It emphasizes writing quality over content, focusing on linguistic accuracy and text structures. It assesses very little in terms of argumentation or coherence. Although the system notices an essay that is written on an irrelevant topic in comparison to the given prompt, it cannot analyze argumentation, logic, or coherence as human raters do. Moreover, the level of accuracy in error detection is often not satisfactory from the perspective of teachers and learner (p. 42).

CONCLUSION

Criterion is fast, automatic, and objective. It can lighten teachers’ workloads and potentially amplify learner opportunities for practice. Criterion scores are highly correlated with human raters’ holistic scores. For teachers and students to make the most of the program, however, they should be critical consumers; it does not evaluate content, argumentation, or coherence. Its error detection has limitations in that it misses many errors that can be detected by human raters. Despite the shortcomings, Criterion can be a useful educational tool, especially if it is used by motivated students and a well-informed writing instructor.

Criterion’s online references, interactive feedback, and digital records of learner performance can help augment L2 learners’ metacognitive, L2-writing knowledge. But more empirical, evaluative studies of this type of software are necessary so teachers and learners can understand more concretely the best way to use Criterion.

Warschauer, M. and Grimes, D. (2008). Automated writing assessment in the classroom
Pedagogies: An International Journal, 3: 22–36, 2008
DOI: 10.1080/15544800701771580

Get full PDF here: http://www.education.uci.edu/person/warschauer_m/docs/awe-pedagogies.pdf Links to an external site.

Excerpts:

All the teachers we interviewed and observed indicated that the program helps to save them time, whether outside of class (when they let the AWE program handle part of their grading) or inside of class (when students work more independently with the AWE program, allowing the teacher to provide more attention to individual students). Yet we saw little evidence that students used AWE to write substantially more than they had previously. In most cases, the main factor limiting how much writing teachers assigned was not their time available to grade papers but rather students’ time available to write papers, and that was not increased by the use of AWE.

Insufficient number of relevant prompts also limited how much teachers could use AWE for graded writing practice.