Journal of Pedagogical Sociology and Psychology
Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models
Tosin Adewumi 1 * , Lama Alkhaled 1, Claudia Buck 1, Sergio Serrano Hernández 1, Saga Brilioth 1, Mkpe Kekung 1, Yelvin Ragimov 1, Elisa Barney 1
More Detail
1 Luleå University of Technology, Sweden
* Corresponding Author
Open Access Full Text (PDF)
ARTICLE INFO

Journal of Pedagogical Sociology and Psychology, Online First, pp. 1-19
https://doi.org/10.33902/jpsp.202536789

Article Type: Research Article

Published Online: 24 Oct 2025

Views: 54 | Downloads: 16

ABSTRACT
We introduce a novel writing method called Probing Chain-of-Thought, which potentially prevents students from cheating using a large language model while enhancing their critical thinking. large language models have disrupted education and many other fields. For fear of students cheating, many educationists have resorted to banning their use. We conduct studies in two different courses with 65 students using qualitative research design primarily (i.e. phenomenological) and quantitative methods. The students in each course were asked to prompt a large language model of their choice with one question from a set of four (random) questions and required to affirm or refute statements in the large language model output by using peer-reviewed references as evidence. In addition, the rubric for assessing the students writing included 5 more criteria: focus, logic, content, style and correctness. The average success rate of the writing of students based on the criteria for the two cases is 79.49% (±12.82%). The results of the rubric assessment show two things: (1) Probing Chain-of-Thought stimulates critical thinking and writing of students through engagement with large language models when we compare the large language models-only output to Probing Chain-of-Thought output and (2) Probing Chain-of-Thought may prevent cheating because of clear limitations in the concerned large language models when we compare students’ Probing Chain-of-Thought output to large language models’ Probing Chain-of-Thought output. In quantitative analysis, we also discover that most students prefer to give answers in fewer words than large language models, which are typically verbose. The average word counts for students in the first course, ChatGPT 3.5, and Phind (v8) are 208, 391 and 383, respectively, while it is 405, 356, and 315 for students, ChatGPT 3.5, and BingAI, respectively, in the second course, where we enforced a minimum word-count of 300 for the students. We provide access to the outputs for possible assessments (available after review).
KEYWORDS
In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, T., Alkhaled, L., Buck, C., Serrano Hernández, S., Brilioth, S., Kekung, M., . . . Barney, E. (2025). Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. https://doi.org/10.33902/jpsp.202536789
In-text citation: (1), (2), (3), etc.
Reference: Adewumi T, Alkhaled L, Buck C, et al. Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. 2025. https://doi.org/10.33902/jpsp.202536789
In-text citation: (1), (2), (3), etc.
Reference: Adewumi T, Alkhaled L, Buck C, Serrano Hernández S, Brilioth S, Kekung M, et al. Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. 2025. https://doi.org/10.33902/jpsp.202536789
In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, Tosin, Lama Alkhaled, Claudia Buck, Sergio Serrano Hernández, Saga Brilioth, Mkpe Kekung, Yelvin Ragimov, and Elisa Barney. "Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models". Journal of Pedagogical Sociology and Psychology (2025). https://doi.org/10.33902/jpsp.202536789
In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, T., Alkhaled, L., Buck, C., Serrano Hernández, S., Brilioth, S., Kekung, M., . . . Barney, E. (2025). Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models. Journal of Pedagogical Sociology and Psychology. https://doi.org/10.33902/jpsp.202536789
In-text citation: (Adewumi et al., 2025)
Reference: Adewumi, Tosin et al. "Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language models". Journal of Pedagogical Sociology and Psychology, 2025. https://doi.org/10.33902/jpsp.202536789
REFERENCES
  • Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475–482.
  • Adewumi, T., Adeyemi, M., Anuoluwapo, A., Peters, B., Buzaaba, H., Samuel, O., Rufai, A. M., Ajibade, B., Gwadabe, T., Koulibaly Traore, M. M., Ajayi, T. O., Muhammad, S., Baruwa, A., Owoicho, P., Ogunremi, T., Ngigi, P., Ahia, O., Nasir, R., Liwicki, F., & Liwicki, M. (2023). Afriwoz: Corpus for exploiting cross-lingual transfer for dialogue generation in low-resource, african languages. In B. Verma & N. K. Kasabov (Eds.), 2023 International Joint Conference on Neural Networks (IJCNN) (pp 1–8). IEEE. https://doi.org/10.1109/IJCNN54540.2023.10191208
  • Adewumi, T., Liwicki, F., & Liwicki, M. (2022). State-of-the-art in open- domain conversational AI: A survey. Information, 13(6). https://doi.org/10.3390/info13060298
  • Adewumi, T. P., Liwicki, F., & Liwicki, M. (2019). Conversational systems in machine learning from the point of view of the philosophy of science—using alime chat and related studies. Philosophies, 4(3), 41. https://doi.org/10.3390/philosophies4030041
  • Adewumi, T., Liwicki, F. S., Liwicki, M., Gardelli, V., Alkhaled, L., & Mokayed, H. (2025). Findings of mega: Maths explanation with LLMS using the Socratic method for active learning. arXiv. https://doi.org/10.48550/arXiv.2507.12079
  • Allam, O., Williams, M., Almeida, M., Alper, D., Craver, A., Persing, J., & Alperovich, M. (2023). Generative pre-trained transformers (gpt) artificial intelligence – assessing the accuracy of ChatGPT as an adjunct for peri-operative care. Plastic and Reconstructive Surgery - Global Open, 11, 132–133. https://doi.org/10.1097/01.GOX.0000992588.09873.87
  • Aristotle. (2007). On rhetoric: A theory of civic discourse (G. A. Kennedy, Trans.). Oxford University Press. (Original work published ca. 4th century B.C.E.)
  • Arslan, F., Hassan, N., Li, C., & Tremayne, M. (2020). A benchmark dataset of check-worthy factual claims. Proceedings of the International AAAI Conference on Web and Social Media, 14, 821–829. https://doi.org/10.1609/icwsm.v14i1.7346
  • Asio, J. M. R., & Gadia, E. D. (2024). Predictors of student attitudes towards artificial intelligence: Implications and relevance to the higher education institutions. International Journal of Didactical Studies, 5(2), 27763. https://doi.org/10.33902/ijods.202427763
  • Aveyard, H. & Waite, M. (2024). A beginner’s guide to critical thinking and writing in health and social care, 3E. McGraw-Hill Education.
  • Barrett, R. & Malcolm, J. (2006). Embedding plagiarism education in the assessment process. International Journal for Educational Integrity, 2(1). https://doi.org/10.21913/IJEI.v2i1.23
  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., …, & Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
  • Calma, A., & Davies, M. (2025). Assessing students’ critical thinking abilities via a systematic evaluation of essays. Studies in Higher Education. Advance online publication. https://doi.org/10.1080/03075079.2025.2470969
  • Cheong, C. M., Luo, N., Zhu, X., Lu, Q., & Wei, W. (2023). Self-assessment complements peer assessment for undergraduate students in an academic writ- ing task. Assessment & Evaluation in Higher Education, 48(1), 135–148. https://doi.org/10.1080/02602938.2022.2069225
  • Creswell, J. W. (2013). Qualitative inquiry and research design: Choosing among five approaches (3rd ed.). Sage Publications.
  • Culver, C. (2023). Learning as a peer assessor: evaluating peer-assessment strate- gies. Assessment & Evaluation in Higher Education, 48(5), 581–597. https://doi.org/10.1080/02602938.2022.2107167
  • Dong, S., Zhi, R., & Gan, F. (2025). Integrating robotics in art education: leveraging constructivist and experiential learning frameworks to enhance cognitive development, problem‐solving skills and collaboration among diverse learners. European Journal of Education, 60(2), e70114. https://doi.org/10.1111/ejed.70114
  • Elbow, P. (1998). Writing without teachers. Oxford University Press.
  • European Commission. (2022). Blended mobility implementation guide for Erasmus+ higher education mobility ka131. https://doi.org/10.1093/oso/9780195120165.001.0001
  • Flower, L. & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365–387. https://doi.org/10.58680/ccc198115885
  • Freedman, A. & Medway, P. (2003). Genre in the new rhetoric. Routledge. https://doi.org/10.4324/9780203393277
  • Heilker, P. (1996). The essay: theory and pedagogy for an active form. Hampton Press.
  • Howard, R. D., McLaughlin, G. W., and Knight, W. E. (2012). The handbook of institutional research. John Wiley & Sons.
  • Kishore, S., Hong, Y., Nguyen, A., & Qutab, S. (2023). Should ChatGPT be banned at schools? organizing visions for generative artificial intelligence (AI) in education. In S. Paul, S. Sarker & v. K. Tuunainen (Eds.), Rising like a Phoenix: Emerging from the Pandemic and Reshaping Human Endeavors with Digital Technologies (p. 5). Association for Information Systems. https://aisel.aisnet.org/icis2023/learnandiscurricula/learnandiscurricula/5/
  • Le, X.-M., Phuong, H.-Y., Phan, Q.-T., & Le, T.-T. (2023). Impact of using analytic rubrics for peer assessment on efl students’ writing performance: An experimental study. Multicultural Education, 9(3), 41-53.
  • Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Ku¨ttler, H., Lewis, M., Yih, W.-t., & Rockt¨aschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.
  • Lin, C.-J., Lee, H.-Y., Wang, W.-S., Huang, Y.-M., & Wu, T.-T. (2025). Enhancing reflective thinking in stem education through experiential learning: The role of generative ai as a learning aid. Education and Information Technologies, 30(5), 6315–6337. https://doi.org/10.1007/s10639-024-13072-5
  • Lipnevich, A. A., Panadero, E., & Calistro, T. (2023). Unraveling the effects of rubrics and exemplars on student writing performance. Journal of Experimental Psychology: Applied, 29(1), 136. https://doi.org/10.1037/xap0000434
  • Maslej, N., Fattorini, L., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Ngo, H., Niebles, J. C., Parli, V., Shoham, Y., Wald, W., Clark, J., & Perrault, R. (2023). Artificial intelligence index report 2023. arXiv. https://doi.org/10.48550/arXiv.2310.03715
  • Mehta, S. R. & Al-Mahrooqi, R. (2015). Can thinking be taught? Linking critical thinking and writing in an EFL context. RELC Journal, 46(1), 23–36. https://doi.org/10.1177/0033688214555356
  • Menlah, C. K. A., & Boateng, F. O. (2025). Examining the effect of AI-based tutoring systems on students' mathematical problem-solving skills: The moderating role of mathematics anxiety. Journal of Pedagogical Sociology and Psychology, 7(3), 5-17. https://doi.org/10.33902/jpsp.202536137
  • Meyer, J. G., Urbanowicz, R. J., Martin, P. C., O’Connor, K., Li, R., Peng, P.-C., Bright, T. J., Tatonetti, N., Won, K. J., & Gonzalez-Hernandez, G. (2023). ChatGPT and large language models in academia: opportunities and challenges. BioData Mining, 16(1), 20.
  • Murray, D. (1972). Teach writing as a process not product. The Leaflet, 71(3), 11– 14.
  • OpenAI. (2023). ChatGPT release notes. OpenAI Help Center. https://help.openai.com/en/articles/6825453-chatgpt-release-notes
  • Paltridge, B., & Starfield, S. (2013). The handbook of English for specific purposes (Vol. 592). Wiley-blackwell.
  • Pang, N. S.-K. (2022). Teachers’ reflective practices in implementing assessment for learning skills in classroom teaching. ECNU Review of Education, 5(3), 470– 490. https://doi.org/10.1177/2096531120936290
  • Pasipamire, N., Chigwada, J., & Maturure, R. (2025). Exploring the use and impact of artificial intelligence in higher education in Africa. Journal of Pedagogical Sociology and Psychology, 7(3), 108-131. https://doi.org/10.33902/jpsp.202532046
  • Paul, R. & Elder, L. (2013). Critical thinking: Tools for taking charge of your professional and personal life. Pearson Education.
  • Pettersson, J., Hult, E., Eriksson, T., & Adewumi, T. (2024). Generative AI and teachers–for us or against us? a case study. In F. Westphal, E. Peretz-Andersson, M. Riveiro, K. Bach, & F. Heintz (Eds.), 14th Scandinavian Conference on Artificial Intelligence SCAI 2024 (pp. 1-7). https://doi.org/10.3384/ecp208005
  • Porter, G. (2022). Collaborative annotation: Links to formative assessment and issues of scale for pedagogy. In R. F. Kizilcefc (Ed.), Proceedings of the Ninth ACM Conference on Learning@ Scale (pp. 313–316). Association for Computing Machinery. https://doi.org/10.1145/3491140.3528322
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  • Ramesh, D. and Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55(3), 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
  • Rawte, V., Sheth, A., & Das, A. (2023). A survey of hallucination in large foundation models. arXiv. https://doi.org/10.48550/arXiv.2309.05922
  • Spector, J. M., Ifenthaler, D., Samspon, D., Yang, L., Mukama, E., Warusavitarana, A., Lokuge Dona, K., Eichhorn, K., Fluck, A., Huang, R., Bridges, S., Lu, J., Ren, Y., Gui, X., Deneen, C. C., San Diego, J., & Gibson, D. C. (2016). Technology enhanced formative assessment for 21st century learning. Journal of Educational Technology & Society, 19(3), 58–71. http://www.jstor.org/stable/jeductechsoci.19.3.58
  • Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for fact extraction and VERification. NAACL-HLT. https://doi.org/10.18653/v1/N18-1074
  • Todd, R. W., Thienpermpool, P., & Keyuravong, S. (2004). Measuring the coherence of writing using topic-based analysis. Assessing Writing, 9(2), 85– 104. https://doi.org/10.1016/j.asw.2004.06.002
  • Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash- lykov, N., Batra, S., Bhargava, P., & Bhosale, S. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv. https://doi.org/10.48550/arXiv.2307.09288
  • Trung, L. T. B. T., Thanh Trung, T., & Dung, T. M. (2025). ChatGPT in Vietnamese math classrooms: What are the influencing factors behind teachers’ adoption?. Journal of Pedagogical Research, 9(2), 72-88. https://doi.org/10.33902/JPR.202531924
  • Van Cleave, J. (2024). Writing toward theoretical resonance. In J. R. Wolgemuth, K. W. Guyotte, & S. A. Shelton (Eds.), Expanding approaches to thematic analysis (pp. 106–122). Routledge. https://doi.org/10.4324/9781003389149-8
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L- ., & Polosukhin, I. (2017). Attention is all you need. arXiv. https://doi.org/10.48550/arXiv.1706.03762
  • Vinuesa, R., Azizpour, H., Leite, I., Balaam, M., Dignum, V., Domisch, S., Fell¨ander, A., Langhans, S. D., Tegmark, M., & Fuso Nerini, F. (2020). The role of artificial intelligence in achieving the sustainable development goals. Nature Communications, 11(1), 1–10. https://doi.org/10.1038/s41467-019-14108-y
  • Wu, Y. (2024). Critical thinking pedagogics design in an era of ChatGPT and other AI tools—shifting from teaching “what” to teaching “why” and “how”. Journal of Education and Development, 8(1), 1. https://doi.org/10.20849/jed.v8i1.1404
  • Yamamoto, M., Umemura, N., & Kawano, H. (2018). Automated essay scoring system based on rubric. In R. Lee (Ed.), Applied computing & information technology (pp. 177–190). Springer https://doi.org/10.1007/978-3-319-64051-8_11
  • Zimmerman, B. J. (2001). Theories of self-regulated learning and academic achievement: An overview and analysis. In B. J. Zimmerman & D. H. Schunk (Eds.), Self-regulated learning and academic achievement: Theoretical perspectives (2nd ed., pp. 1–37). Lawrence Erlbaum Associates.
LICENSE
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.