Maximizing AI Chatbots for Language Learning

This brief post showcases some of the uses I’ve been experimenting with artificial intelligence (AI) chatbots like Claude and ChatGPT. After a few months of testing them for linguistic, coding, and mathematical tasks, I’ve settled on Claude. I know there are more recent versions of ChatGPT and also other generative AI models like DeepSeek, but considering time constraints, I’m happy with what Claude is delivering. Feel free to implement some of these ideas with your favorite AI tool.

I’ve been using the Personal Pro Plan of Claude, which costs me roughly R$ 120 per month. You don’t need a paid version to get 100% of the same results I get, though, so don’t worry about having to spend money on it. The key to getting the best results is to refine the prompts (commands) you provide the chatbot. In addition to this aspect, you need to be extra careful with inaccurate answers, generalizations, and even complete fabrications — this happened many times with me on ChatGPT, the main reason why I stopped using it. In short, it’s always best to double-check and debate with your teacher or a proficient user of English the results you obtain from an AI tool, especially in more intricate topics. I illustrate this aspect with one example at the end of this post.

Simple tasks – exercises

This is one of the easiest, most practical applications. You want to explore a certain text or create an exercise to practice a specific grammar aspect. For example, collocations in a text or the uses of articles (and no articles as in generalizations). You just ask the chatbot to create it; the more specific you are, the better results you get. A few examples:

In the example above, first I asked for 10 collocations. Only after analyzing them and pondering their relevance, I asked the chatbot to create an exercise. If you want it to be more specific, you can specify: “create a fill-the-gaps exercise,” for instance.

This other example is grammar-focused, based on a student’s original production. I wanted Claude to elaborate more exercises to practice this particular feature of contrastive (concessive) subordinate clauses. Then, I asked it to improve the exercise to be an artifact (something I can publish directly from Claude). I stopped there, but a typical next step would be asking Claude to make it interactive, so students can type directly into the page and obtain the answers soon after. Check the next example to see it:

The exercise above required more interaction between me and the AI tool because of implementation problems. First, the context: this was based on a Duolingo English Test question type which requires candidates to fill the missing letters to complete a word in a sentence. From a linguistic point of view, it’s a great way of testing a student’s ability to recognize patterns (e.g., collocations). As you can see in the first image, the result was frustrating. I complained about it and asked Claude to try again. The result (on the right side of the image in the middle) was even worse. So I emphasized the exercise had to be like the original one. The final result was almost identical to the Duolingo test. And interactive. You can play with it yourself by clicking here. In case you need the answers, here they are: catch, practice, contact, submit, regulate, determine, implement, evaluate, investigate, and scrutinize.

This last example considers a student’s original production (passages of a Master’s thesis) and asks Claude not only to identify the uses of articles but also to explain the relevant rules. This way, I avoided providing a long, generic grammar section on article usage from a course book and focused on specific cases relevant to my student’s context.

Correction and review tasks

Another valuable use of these chatbots is text correction and review. For me, this is a critical feature because I prepare a lot of candidates for exams that make use of AI scoring engines themselves. That’s the case for both TOEFL iBT and Duolingo tests. Apart from using human raters (in the TOEFL’s case), both exams have their own AI-powered robots that check candidates’ written productions like essays and short texts. So I do the same by comparing my own correction process and grade with an AI bot output. What is paramount in these cases is providing all the same elements human raters would have in hand when correcting a student’s essay: the task, with any supporting materials, the scoring criteria/rubric or guidelines, and the student’s writing of course. Here are two examples of it:

In the first image, I provided all the task prompts with any supporting materials: texts, audio transcripts (for example, TOEFL iBT has a writing task that provides both a short reading passage and an excerpt of a lecture on the same topic), and questions. I then provide all the scoring criteria from the official test’s manual, word by word, including the scoring rubric for a response that would get a 10 and one that would get a 0. In the second image, I provided my student’s answer to the task, specifically asking the AI tool to rate it and provide feedback. That’s what you can see in images 3 and 4: first a detailed feedback covering major aspects that are described in the scoring criteria (e.g., content, delivery, topic development, language, etc.) and then a score, with justification for its decision. Based on that, I compare it to my own judgment and get a final score, much like the official test procedure.

The second example is a bit different: in this case, I asked Claude to first create a realistic response to the task (simulating what a human candidate would be capable of doing in a real-life scenario) and then evaluate that response. This is particularly useful in tasks that require oral/written responses to a visual stimulus. For the Duolingo test, I didn’t have access to any official preparation materials, so to practice this type of question — Speak about the photo —, I had to formulate myself a response that I could provide students as a sample or reference. Then comes the AI to my rescue:

First, I provided a random photo (I chose one on Google) and the directions/prompt. Then, as you can see in the top-right image, I asked Claude to create two samples (as transcripts) aimed at B1 and B2 levels. Finally, I provided the scoring criteria and asked the chatbot to evaluate its own simulated responses. Of course, this particular AI tool can’t evaluate some features, such as pronunciation, intonation, etc., but it can evaluate all the other aspects.

A word of caution

To finish this post, I want to share my interaction with Claude the other day. The issue was quite simple from my perspective: there was a sentence from a TOEFL ITP test (from a section targeting structure and written expression) that contained a mistake, and the student’s job was to identify it. My student got the right answer, but she wasn’t sure about the reason. Being herself an English teacher, I wanted to provide the most thorough explanation I could. What happened helped me realize we need to always be careful when reading and accepting AI-generated content. Let’s analyze it together:

The mistake

As you can see, Claude’s response identified the mistake (plant grassy growth is wrong) but provided a very different answer from what I expected. For me, it was obvious that grassy is an adjective that should be modifying plant growth, a compound noun. Claude decided to throw the baby out with the bathwater.

My not-so-useful reaction

I reacted by asking if it couldn’t be written differently, preserving the three elements (grassy, plant, growth), which remained clear to me as the right construction. I was a bit unfortunate in the formulation I provided. Anyhow, Claude kept on trying to eliminate one of the words.

Let’s get grammarish

I decided to be honest and express my reservations about its reasoning. Claude then realized I was talking about grassy as a noun modifier, but fell short of capturing the idea that plant growth could be treated as a compound noun. It offered alternatives that weren’t satisfying me, but the last one (gramineous plant growth) was exactly what occurred to me. So why not grassy plant growth? Claude was not on the same page as I was.

More grammatical debates

I thought Claude had already figured out that both gramineous and grassy were adjectives modifying a compound noun. But it hadn’t. The chatbot went to other places in English grammar to try and convince me I was wrong as you can check in the images below. I didn’t buy it again. Then a revelation ensued.

Claude admits it was wrong

After being confronted again by its double standard regarding grassy and gramineous, the chatbot admitted being inconsistent. Read its statements: “You’ve made me realize I was being inconsistent in my reasoning!” and “Thank you for pushing back – you helped me see that I was making artificial distinctions and not applying the same structural analysis consistently across similar constructions.“

Happy ending?

As much as it could sound relieving that Claude was wrong, this was a major red flag for me. I believed the chatbot was capable of backing its rationale despite my counterarguments, but that was not the case. It also admitted not having access to a linguistic corpus nor being sure about the most typical construction. In the end, this meant I was the one who had to make the decision.

This leads us back to the section’s title: a word of caution. Because I’m an experienced teacher, I was able to disagree with this AI’s grammar-based arguments and push it to the point of realizing it was being inconsistent. Moreover, I had to rely on my own experience and tools to verify that grassy plant growth was a valid, common construction in authentic materials. I did so because I know where to verify this (e.g., COCA, but also news websites, official government publications, and academic journals). Here are the three sources I used to confirm this collocation pattern: Wisconsin’s Wetlands Association, The Malibu Times, and Largo (Florida) Code of Ordinances.

My final interaction was to tell Claude I had done the job and confirmed my reasoning. The chatbot reflected on it, which is a good sign, but the bottom line is that you should always take its outputs with a grain of salt. If you’re going to use AI tools such as ChatGPT, Claude, DeepSeek, and others to study, bear in mind what I said at the beginning of this publication: it’s always best to double-check and debate with your teacher or a proficient user of English. Happy studies!

Maximizing AI Chatbots for Language Learning

Simple tasks – exercises

Correction and review tasks

A word of caution

The mistake

My not-so-useful reaction

Let’s get grammarish

More grammatical debates

Claude admits it was wrong

Happy ending?

Published by Luiz Coletto

Leave a comment Cancel reply

Simple tasks – exercises

Correction and review tasks

A word of caution

The mistake

My not-so-useful reaction

Let’s get grammarish

More grammatical debates

Claude admits it was wrong

Happy ending?

Share this:

Related

Published by Luiz Coletto

Leave a comment Cancel reply