Today I attended another CLIP colloquila at the University of MD:
Feb 22: Rebecca Hwa, The Role of Machine Translation in Modeling English as a Second Language (ESL) Writings
She addressed these research questions:
1. How patterned are the errors of English language learners?
1a. Could ‘English with mistakes’ be used as an input for machine translation?
1b. Could that be used to improve mt outputs?
1c. Could these findings be used for EFL training?
Her presentation made me think a lot about the role of linguistics in this type of work and about the nature of English.
First, I am coming to firmly believe that the best text processing should be done in partnership between linguists and computer scientists. Linguistics provides the most thorough and reliable frame for computer scientists to key off of, and once you stray from the nature of what you’re trying top represent, you end up astray.
So, for example, in the first part of her research presentation she talked about a project involving machine translation and English language learners of all backgrounds. One woman in the audience kept asking questions about the conglomeration of non native English speakers, and I assumed she was from the English department. The issue of mistakes in language use is a huge one, and a focus has to be chose from which to do the work. Maybe language background would be a more productive way to narrow the focus, and would allow for much more specific structural guidance and bodies of knowledge on language interference.
Second, she spoke about Chinese English language learners in particular and her investigation of lexical choice. Often English language learners’ written English is marked by lexical choices that appear strange to native English speakers. Her hypothesis was that the words that were used in place of the correct words were similar in some way to the correct words, most likely by context. She played a lot with the definition of context; was it proximity? Was it a specific grammatical relationship? This discussion was fascinating, but probably could have benefited from some restrictions on the context of the errors she was targeting. Again, this is from the linguistics end of the linguistics—computer science spectrum.
Her speech made me think a lot about the nature of English. I often think about what it means to be a global language. English is spoken in many places where there are not native speakers, and it is spoken in many places that we don’t traditionally think of as native English places. Often the English that arises from these contexts is judged to be full of errors, but I don’t necessarily agree with this. Instead, I would ask two questions:
1. Is the variation patterned?
2. Is communication successful?
If the answer to these questions is yes, then I don’t think that the speaker is producing errors, so much as a different variety of English. Varieties of English are not all treated with the same respect, but I suspect that the reasons behind this are more to do with the prejudices of the person judging the grammar than a paucity on the part of the speaker.