Part-of-Speech Distribution across Proficiency and Advanced EFL Texts: A Quantitative Comparison for Pedagogical Application

Authors

  • Dragan Donev
  • Krste Iliev
  • Natalija Pop Zarieva

Keywords:

POS tagging, corpus linguistics, computational linguistics, data analysis, PERMANOVA

Abstract

This study investigated grammatical variation between Advanced Masterclass and Proficiency Masterclass EFL textbook and workbook texts to determine whether part-of-speech (POS) distributions change systematically across the CEFR C1–C2 interface. A balanced corpus of 60 reading texts (30 per level) was compiled, POS-tagged with spaCy, and analyzed quantitatively using Welch’s t, Mann–Whitney U, effect sizes, false-discovery-rate correction, and robust 20 % trimmed-mean tests. A multivariate PERMANOVA confirmed a small but significant global difference between levels (F = 2.624, p = .006, R² ? .03). Individual contrasts indicated that Proficiency texts contained relatively higher proportions of determiners and prepositions, while Advanced texts featured greater use of numerals, adjectives, and adverbs. Findings showed small but systematic differences: Proficiency texts used more cohesive, narrative-oriented grammar (determiners, pronouns, prepositions), while Advanced texts showed relatively greater use of informational or expository elements (numerals, comparative adjectives, adverbs). The study illustrates how transparent, code-based POS profiling can reveal subtle grammatical distinctions in pedagogical materials and support evidence-informed textbook evaluation. By combining classical, non-parametric, robust, and multivariate analyses, the approach ensures replicable results and provides a methodological template for future corpus-based research on advanced-level language input. The findings underscore the pedagogical value of aligning grammatical exposure with discourse progression from C1 to C2 in EFL instruction.

Author Biographies

  • Dragan Donev

    Goce Delchev University, Krste Misirkov 10-a, 2000 Shtip, North Macedonia

  • Krste Iliev

    Goce Delchev University, Krste Misirkov 10-a, 2000 Shtip, North Macedonia

  • Natalija Pop Zarieva

    Goce Delchev University, Krste Misirkov 10-a, 2000 Shtip, North Macedonia

References

[1] Biber, D. (1988). Variation across speech and writing. Cambridge University Press. https://doi.org/10.1017/CBO9780511621024

[2] Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Pearson Education

[4] Oxford University Press. (2012). Advanced Masterclass: Student’s Book. OUP.

[5] Oxford University Press. (2012). Advanced Masterclass: Workbook. OUP.

[6] Oxford University Press. (2015). Proficiency Masterclass: Student’s Book. OUP.

[7] Oxford University Press. (2015). Proficiency Masterclass: Workbook. OUP.

[8] Römer, U. (2006). Pedagogical applications of corpora: Some reflections on the current scope and a wish list for future developments. Zeitschrift für Angewandte Linguistik, 44(2), 121-134.

[9] Römer, U., Cortes, V., & Friginal, E. (Eds.). (2020). Advances in corpus-based research on academic writing: Effects of discipline, register, and writer expertise. John Benjamins Publishing Company

Downloads

Published

2025-11-19

Issue

Section

Articles

How to Cite

Dragan Donev, Krste Iliev, & Natalija Pop Zarieva. (2025). Part-of-Speech Distribution across Proficiency and Advanced EFL Texts: A Quantitative Comparison for Pedagogical Application. International Journal of Sciences: Basic and Applied Research (IJSBAR), 78(1), 342-352. https://www.gssrr.org/JournalOfBasicAndApplied/article/view/17649