Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models

Tosin Adewumi; Lama Alkhaled; Claudia Buck; Sergio Serrano Hernández; Saga Brilioth; Mkpe Kekung; Yelvin Ragimov; Elisa Barney

doi:10.33902/jpsp.202536789

Tosin Adewumi ¹ ^* , Lama Alkhaled ¹, Claudia Buck ¹, Sergio Serrano Hernández ¹, Saga Brilioth ¹, Mkpe Kekung ¹, Yelvin Ragimov ¹, Elisa Barney ¹

More Detail

¹ Luleå University of Technology, Sweden
^* Corresponding Author

Open Access Full Text (PDF)

ARTICLE INFO

Journal of Pedagogical Sociology and Psychology, 2025 - Volume 7 Issue 4, pp. 227-245
https://doi.org/10.33902/jpsp.202536789

Article Type: Research Article

Published Online: 24 Oct 2025

Views: 929 | Downloads: 645

ABSTRACT

We introduce a novel writing method called Probing Chain-of-Thought, which potentially prevents students from cheating using a large language model while enhancing their critical thinking. large language models have disrupted education and many other fields. For fear of students cheating, many educationists have resorted to banning their use. We conduct studies in two different courses with 65 students using qualitative research design primarily (i.e. phenomenological) and quantitative methods. The students in each course were asked to prompt a large language model of their choice with one question from a set of four (random) questions and required to affirm or refute statements in the large language model output by using peer-reviewed references as evidence. In addition, the rubric for assessing the students writing included 5 more criteria: focus, logic, content, style and correctness. The average success rate of the writing of students based on the criteria for the two cases is 79.49% (±12.82%). The results of the rubric assessment show two things: (1) Probing Chain-of-Thought stimulates critical thinking and writing of students through engagement with large language models when we compare the large language models-only output to Probing Chain-of-Thought output and (2) Probing Chain-of-Thought may prevent cheating because of clear limitations in the concerned large language models when we compare students’ Probing Chain-of-Thought output to large language models’ Probing Chain-of-Thought output. In quantitative analysis, we also discover that most students prefer to give answers in fewer words than large language models, which are typically verbose. The average word counts for students in the first course, ChatGPT 3.5, and Phind (v8) are 208, 391 and 383, respectively, while it is 405, 356, and 315 for students, ChatGPT 3.5, and BingAI, respectively, in the second course, where we enforced a minimum word-count of 300 for the students. We provide access to the outputs for possible assessments (available after review).

KEYWORDS

ChatGPT cheating education pedagogy ProCoT LLM

In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, T., Alkhaled, L., Buck, C., Serrano Hernández, S., Brilioth, S., Kekung, M., . . . Barney, E. (2025). Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology, 7(4), 227-245. https://doi.org/10.33902/jpsp.202536789

In-text citation: (1), (2), (3), etc.
Reference: Adewumi T, Alkhaled L, Buck C, et al. Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. 2025;7(4), 227-245. https://doi.org/10.33902/jpsp.202536789

In-text citation: (1), (2), (3), etc.
Reference: Adewumi T, Alkhaled L, Buck C, Serrano Hernández S, Brilioth S, Kekung M, et al. Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. 2025;7(4):227-45. https://doi.org/10.33902/jpsp.202536789

In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, Tosin, Lama Alkhaled, Claudia Buck, Sergio Serrano Hernández, Saga Brilioth, Mkpe Kekung, Yelvin Ragimov, and Elisa Barney. "Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models". Journal of Pedagogical Sociology and Psychology 2025 7 no. 4 (2025): 227-245. https://doi.org/10.33902/jpsp.202536789

In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, T., Alkhaled, L., Buck, C., Serrano Hernández, S., Brilioth, S., Kekung, M., . . . Barney, E. (2025). Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology, 7(4), pp. 227-245. https://doi.org/10.33902/jpsp.202536789

In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, Tosin et al. "Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models". Journal of Pedagogical Sociology and Psychology, vol. 7, no. 4, 2025, pp. 227-245. https://doi.org/10.33902/jpsp.202536789

REFERENCES

Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475–482.
Adewumi, T., Adeyemi, M., Anuoluwapo, A., Peters, B., Buzaaba, H., Samuel, O., Rufai, A. M., Ajibade, B., Gwadabe, T., Koulibaly Traore, M. M., Ajayi, T. O., Muhammad, S., Baruwa, A., Owoicho, P., Ogunremi, T., Ngigi, P., Ahia, O., Nasir, R., Liwicki, F., & Liwicki, M. (2023). Afriwoz: Corpus for exploiting cross-lingual transfer for dialogue generation in low-resource, african languages. In B. Verma & N. K. Kasabov (Eds.), 2023 International Joint Conference on Neural Networks (IJCNN) (pp 1–8). IEEE. https://doi.org/10.1109/IJCNN54540.2023.10191208
Adewumi, T., Liwicki, F., & Liwicki, M. (2022). State-of-the-art in open- domain conversational AI: A survey. Information, 13(6). https://doi.org/10.3390/info13060298
Adewumi, T. P., Liwicki, F., & Liwicki, M. (2019). Conversational systems in machine learning from the point of view of the philosophy of science—using alime chat and related studies. Philosophies, 4(3), 41. https://doi.org/10.3390/philosophies4030041
Adewumi, T., Liwicki, F. S., Liwicki, M., Gardelli, V., Alkhaled, L., & Mokayed, H. (2025). Findings of mega: Maths explanation with LLMS using the Socratic method for active learning. arXiv. https://doi.org/10.48550/arXiv.2507.12079
Allam, O., Williams, M., Almeida, M., Alper, D., Craver, A., Persing, J., & Alperovich, M. (2023). Generative pre-trained transformers (gpt) artificial intelligence – assessing the accuracy of ChatGPT as an adjunct for peri-operative care. Plastic and Reconstructive Surgery - Global Open, 11, 132–133. https://doi.org/10.1097/01.GOX.0000992588.09873.87
Aristotle. (2007). On rhetoric: A theory of civic discourse (G. A. Kennedy, Trans.). Oxford University Press. (Original work published ca. 4th century B.C.E.)
Arslan, F., Hassan, N., Li, C., & Tremayne, M. (2020). A benchmark dataset of check-worthy factual claims. Proceedings of the International AAAI Conference on Web and Social Media, 14, 821–829. https://doi.org/10.1609/icwsm.v14i1.7346
Asio, J. M. R., & Gadia, E. D. (2024). Predictors of student attitudes towards artificial intelligence: Implications and relevance to the higher education institutions. International Journal of Didactical Studies, 5(2), 27763. https://doi.org/10.33902/ijods.202427763
Aveyard, H. & Waite, M. (2024). A beginner’s guide to critical thinking and writing in health and social care, 3E. McGraw-Hill Education.
Barrett, R. & Malcolm, J. (2006). Embedding plagiarism education in the assessment process. International Journal for Educational Integrity, 2(1). https://doi.org/10.21913/IJEI.v2i1.23
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., …, & Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
Calma, A., & Davies, M. (2025). Assessing students’ critical thinking abilities via a systematic evaluation of essays. Studies in Higher Education. Advance online publication. https://doi.org/10.1080/03075079.2025.2470969
Cheong, C. M., Luo, N., Zhu, X., Lu, Q., & Wei, W. (2023). Self-assessment complements peer assessment for undergraduate students in an academic writ- ing task. Assessment & Evaluation in Higher Education, 48(1), 135–148. https://doi.org/10.1080/02602938.2022.2069225
Creswell, J. W. (2013). Qualitative inquiry and research design: Choosing among five approaches (3rd ed.). Sage Publications.
Culver, C. (2023). Learning as a peer assessor: evaluating peer-assessment strate- gies. Assessment & Evaluation in Higher Education, 48(5), 581–597. https://doi.org/10.1080/02602938.2022.2107167
Dong, S., Zhi, R., & Gan, F. (2025). Integrating robotics in art education: leveraging constructivist and experiential learning frameworks to enhance cognitive development, problem‐solving skills and collaboration among diverse learners. European Journal of Education, 60(2), e70114. https://doi.org/10.1111/ejed.70114
Elbow, P. (1998). Writing without teachers. Oxford University Press.
European Commission. (2022). Blended mobility implementation guide for Erasmus+ higher education mobility ka131. https://doi.org/10.1093/oso/9780195120165.001.0001
Flower, L. & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365–387. https://doi.org/10.58680/ccc198115885
Freedman, A. & Medway, P. (2003). Genre in the new rhetoric. Routledge. https://doi.org/10.4324/9780203393277
Heilker, P. (1996). The essay: theory and pedagogy for an active form. Hampton Press.
Howard, R. D., McLaughlin, G. W., and Knight, W. E. (2012). The handbook of institutional research. John Wiley & Sons.
Kishore, S., Hong, Y., Nguyen, A., & Qutab, S. (2023). Should ChatGPT be banned at schools? organizing visions for generative artificial intelligence (AI) in education. In S. Paul, S. Sarker & v. K. Tuunainen (Eds.), Rising like a Phoenix: Emerging from the Pandemic and Reshaping Human Endeavors with Digital Technologies (p. 5). Association for Information Systems. https://aisel.aisnet.org/icis2023/learnandiscurricula/learnandiscurricula/5/
Le, X.-M., Phuong, H.-Y., Phan, Q.-T., & Le, T.-T. (2023). Impact of using analytic rubrics for peer assessment on efl students’ writing performance: An experimental study. Multicultural Education, 9(3), 41-53.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Ku¨ttler, H., Lewis, M., Yih, W.-t., & Rockt¨aschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.
Lin, C.-J., Lee, H.-Y., Wang, W.-S., Huang, Y.-M., & Wu, T.-T. (2025). Enhancing reflective thinking in stem education through experiential learning: The role of generative ai as a learning aid. Education and Information Technologies, 30(5), 6315–6337. https://doi.org/10.1007/s10639-024-13072-5
Lipnevich, A. A., Panadero, E., & Calistro, T. (2023). Unraveling the effects of rubrics and exemplars on student writing performance. Journal of Experimental Psychology: Applied, 29(1), 136. https://doi.org/10.1037/xap0000434
Maslej, N., Fattorini, L., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Ngo, H., Niebles, J. C., Parli, V., Shoham, Y., Wald, W., Clark, J., & Perrault, R. (2023). Artificial intelligence index report 2023. arXiv. https://doi.org/10.48550/arXiv.2310.03715
Mehta, S. R. & Al-Mahrooqi, R. (2015). Can thinking be taught? Linking critical thinking and writing in an EFL context. RELC Journal, 46(1), 23–36. https://doi.org/10.1177/0033688214555356
Menlah, C. K. A., & Boateng, F. O. (2025). Examining the effect of AI-based tutoring systems on students' mathematical problem-solving skills: The moderating role of mathematics anxiety. Journal of Pedagogical Sociology and Psychology, 7(3), 5-17. https://doi.org/10.33902/jpsp.202536137
Meyer, J. G., Urbanowicz, R. J., Martin, P. C., O’Connor, K., Li, R., Peng, P.-C., Bright, T. J., Tatonetti, N., Won, K. J., & Gonzalez-Hernandez, G. (2023). ChatGPT and large language models in academia: opportunities and challenges. BioData Mining, 16(1), 20.
Murray, D. (1972). Teach writing as a process not product. The Leaflet, 71(3), 11– 14.
OpenAI. (2023). ChatGPT release notes. OpenAI Help Center. https://help.openai.com/en/articles/6825453-chatgpt-release-notes
Paltridge, B., & Starfield, S. (2013). The handbook of English for specific purposes (Vol. 592). Wiley-blackwell.
Pang, N. S.-K. (2022). Teachers’ reflective practices in implementing assessment for learning skills in classroom teaching. ECNU Review of Education, 5(3), 470– 490. https://doi.org/10.1177/2096531120936290
Pasipamire, N., Chigwada, J., & Maturure, R. (2025). Exploring the use and impact of artificial intelligence in higher education in Africa. Journal of Pedagogical Sociology and Psychology, 7(3), 108-131. https://doi.org/10.33902/jpsp.202532046
Paul, R. & Elder, L. (2013). Critical thinking: Tools for taking charge of your professional and personal life. Pearson Education.
Pettersson, J., Hult, E., Eriksson, T., & Adewumi, T. (2024). Generative AI and teachers–for us or against us? a case study. In F. Westphal, E. Peretz-Andersson, M. Riveiro, K. Bach, & F. Heintz (Eds.), 14th Scandinavian Conference on Artificial Intelligence SCAI 2024 (pp. 1-7). https://doi.org/10.3384/ecp208005
Porter, G. (2022). Collaborative annotation: Links to formative assessment and issues of scale for pedagogy. In R. F. Kizilcefc (Ed.), Proceedings of the Ninth ACM Conference on Learning@ Scale (pp. 313–316). Association for Computing Machinery. https://doi.org/10.1145/3491140.3528322
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Ramesh, D. and Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55(3), 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
Rawte, V., Sheth, A., & Das, A. (2023). A survey of hallucination in large foundation models. arXiv. https://doi.org/10.48550/arXiv.2309.05922
Spector, J. M., Ifenthaler, D., Samspon, D., Yang, L., Mukama, E., Warusavitarana, A., Lokuge Dona, K., Eichhorn, K., Fluck, A., Huang, R., Bridges, S., Lu, J., Ren, Y., Gui, X., Deneen, C. C., San Diego, J., & Gibson, D. C. (2016). Technology enhanced formative assessment for 21st century learning. Journal of Educational Technology & Society, 19(3), 58–71. http://www.jstor.org/stable/jeductechsoci.19.3.58
Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for fact extraction and VERification. NAACL-HLT. https://doi.org/10.18653/v1/N18-1074
Todd, R. W., Thienpermpool, P., & Keyuravong, S. (2004). Measuring the coherence of writing using topic-based analysis. Assessing Writing, 9(2), 85– 104. https://doi.org/10.1016/j.asw.2004.06.002
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash- lykov, N., Batra, S., Bhargava, P., & Bhosale, S. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv. https://doi.org/10.48550/arXiv.2307.09288
Trung, L. T. B. T., Thanh Trung, T., & Dung, T. M. (2025). ChatGPT in Vietnamese math classrooms: What are the influencing factors behind teachers’ adoption?. Journal of Pedagogical Research, 9(2), 72-88. https://doi.org/10.33902/JPR.202531924
Van Cleave, J. (2024). Writing toward theoretical resonance. In J. R. Wolgemuth, K. W. Guyotte, & S. A. Shelton (Eds.), Expanding approaches to thematic analysis (pp. 106–122). Routledge. https://doi.org/10.4324/9781003389149-8
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L- ., & Polosukhin, I. (2017). Attention is all you need. arXiv. https://doi.org/10.48550/arXiv.1706.03762
Vinuesa, R., Azizpour, H., Leite, I., Balaam, M., Dignum, V., Domisch, S., Fell¨ander, A., Langhans, S. D., Tegmark, M., & Fuso Nerini, F. (2020). The role of artificial intelligence in achieving the sustainable development goals. Nature Communications, 11(1), 1–10. https://doi.org/10.1038/s41467-019-14108-y
Wu, Y. (2024). Critical thinking pedagogics design in an era of ChatGPT and other AI tools—shifting from teaching “what” to teaching “why” and “how”. Journal of Education and Development, 8(1), 1. https://doi.org/10.20849/jed.v8i1.1404
Yamamoto, M., Umemura, N., & Kawano, H. (2018). Automated essay scoring system based on rubric. In R. Lee (Ed.), Applied computing & information technology (pp. 177–190). Springer https://doi.org/10.1007/978-3-319-64051-8_11
Zimmerman, B. J. (2001). Theories of self-regulated learning and academic achievement: An overview and analysis. In B. J. Zimmerman & D. H. Schunk (Eds.), Self-regulated learning and academic achievement: Theoretical perspectives (2nd ed., pp. 1–37). Lawrence Erlbaum Associates.

LICENSE

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.