- 作者: Sylviane Granger,Gaëtanelle Gilquin,Fanny Meunier
- 出版社/メーカー: Cambridge University Press
- 発売日: 2015/10/01
- メディア: ハードカバー
この本のchapter 1とchapter 25の内容をまとめる機会があったので、下記に転載します。
Granger, S., Gilquin, G., & Meunier, F. (2015). Introduction: learner corpus research – past, present and future. In S. Granger, G. Gilquin & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 1-5). Cambridge University Press.
Leacock, C., Chodorow, M., & Tetreault, J. (2015). Automatic grammar- and spell-checking for language learners. In S. Granger, G. Gilquin & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 567-587). Cambridge University Press.
- Introduction: learner corpus research – past, present and future
Learner corpus research (LCR) emerged in the late 1980s.
There are two advantages in access to electronic collections of L2 data.
・They are more representative than smaller data samples.
・The data can be analyzed with a whole battery of software tools
Cf. POS taggers and concordance program
The field of learner corpus research has undergone remarkable developments
・137 learner corpora (Learner corpora around the world)
82 (60%) L2 English, the rest focusing on other languages
The dominant focus is on writing (essay writing)
・Research design (longitudinal data)
Paquot, M., & Plonsky, L. (2015). Quantitative research methods and study quality in learner corpus research. LCR 2015. https://twitter.com/mrkm_a/status/642802550928998400
The handbook is subdivided into five main parts:
- Learner corpus design and methodology
- Analysis of learner language
- LCR and SLA
- LCR and language teaching
- LCR and NLP
A number of issues
Recommended key readings
- Automatic grammar- and spell-checking for language learners
Granger and Meunier (1994): grammar- and spell-checking as a promising application for learner corpus research.
There is a complex relationship between automated error-correction systems and the learner corpora.
・Some systems require large amounts of error-annotated learner writing.
2 Core issues
2.1 Brief background on grammatical error correction
Published research first appeared in the 1980s.
Cf. Grammar Writer’s Workbench
The approach began to shift from rule-based to statistical in the mid 1990s.
⇔almost all error-correction systems make use of at least some rules.
2.2 Brief background on spelling-error correction
Kukich (1992) identified three strands of research.
(1) non-word error detection
(2) isolated-word error correction
(3) context-dependent error correction
Cf. 編集距離 (edit distance)とは、「2つの文字列があるときに，一方の文字列をどのくらい編集するともう一方の文字列が作成されるかを距離として計算することで，2 つの文字列の類似度（相違度）を測る尺度」（投野・望月, 2013, p. 74）
2.3 The needs of L2 learners
From researcher’s pedagogical experience to learner corpus such as Cambridge Learner Corpus
→The most common error is content word choice.
Rimrott and Heift (2008) evaluated the helpfulness of generic spell-checkers for L2 learners.
The spelling errors were classified as lexical, morphological and phonological.
For 62% of the learners’ errors, the intended word was among the suggested corrections provided by Microsoft Word.
2.4 The importance and design of learner corpora
2.4.1 Annotation of grammatical errors in learner corpora
Gamon (2010)’s research
・Errors are often ambiguous.
→researchers have often used learner text that is annotated for only a single targeted type of error.
The cost of developing the corpus was quite high.
→To use the error –detection system to output the errors it has found in learner text and then to ask one or more annotators to verify the output.
⇔Whenever the system is modified, its output is likely to change.
⇔It cannot be used for calculating recall.
・Judgments of usage errors are not as clear-cuts as those of grammatical errors.
→Using crowdsourcing to annotate learner errors.
・Errors often appear in ‘noisy’, error-ridden contexts.
→measuring the edit distance
2.4.2 Annotation of spelling errors in learner corpora
Bestgen and Granger (2011): identifying the categories of errors that affect essay scores.
Flor and Futagi (2012, 2013); Flor (2012): developing algorithms for spelling correction.
- Helping Our Own 1 (HOO-1)
- Helping Our Own 2 (HOO-2)
- 2013 conference on Computational Natural Language Learning (CoNLL 2013)
- 2014 conference on Computational Natural Language Learning (CoNLL 2014)
Cf. EDCW (Error Detection and Correction Workshop) 2012
- Representative studies
A brief overview of two commonly used techniques: machine-learning (ML) statistical classifiers and language models.
machine-learning (ML) statistical classifiers: 教師あり学習
language models: 教師なし学習
3.1 Tetreault and Chodorow (2008)
TASK: 34 most frequent prepositions
Training data: about 7 million preposition from the Lexile corpus (fiction, non-fiction and textbooks).
RESULTS: 84% precision, almost 19 % recall.
3.2 Han, Tetreault, Lee and Ha (2010)
TASK: preposition-error identification and correction
Data: error-tagged corpus of essays written by English as a FL students in South Korea (111,000 essays)
Training data: about 1 million cases of preposition usage from the data.
RESULTS: 93 % precision, 15 % recall.
3.3 Rozovskaya and Roth (2010)
Developed four methods for artificially introducing article errors into training data.
Cf. GenERRate (http://www.computing.dcu.ie/~jfoster/resources/genERRate.html)
3.4 Mitton and Okada (2007)
TASK: Developed an algorhithm for spell-checker
RESULTS: The top suggestion (from 61.2% to 65.8%), the top three suggestions (73.3% to 78.7%) and among the top six suggestion (77.9% to 83.5%)
4 Critical assessment and future directions
There has been an immense amount of research into the development of grammatical error correction system.
・There is a need for efficient and reliable annotation of learner corpora for system training and evaluation.
・there is also a need to develop error-correction resources for learners of other languages.
・tailoring the error-detection systems to the native language of the writer.
辻井潤一（2012）「合理主義と経験主義のはざまで―内的な処理の計算モデル―」人工知能学会誌, 27(3), 273-283.