A Reviewed Study On Cpu-Optimized Parameter-Efficient Fine- Tuning For Large Language Models To Increase Accuracy Using Lora

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Year : 2026 | Volume : 13 | 01 | Page :
    By

    Jahanvi Maheshwari,

  • Naveen Hemrajani,

  • Seema Sharma,

  1. Student, Department of CSE, JECRC University, Jaipur, Rajasthan, India
  2. Student, Department of CSE, JECRC University, Jaipur, Rajasthan, India
  3. Assistant Professor, Department of CSE, JECRC University, Jaipur, Rajasthan, India

Abstract

The fast proliferation of Large Language Models (LLMs) has increased the need to optimize the process of fine-tuning but the existing workflows that require a GPU are still expensive, intensive, and unavailable to most researchers. This paper is driven by the desire to have a more cost-efficient and democratized version by examining a CPU-efficient implementation of Parameter-Efficient Fine-Tuning (PEFT) based on Low-Rank Adaptation (LoRA). The major purpose of the study is to find out whether CPUs with low-rank matrix updates, mixed-precision quantization, SIMD vectorization, operator fusion, and NUMA-sensitive scheduling can be as precise as GPU-based LoRA at a lower computational cost. The suggested methodology presents a full CPU-LoRA pipeline that consists of selective transformation to the layer of transformers, quantizing 4-bit to 8-bit, implementing matrix operations with the use of BLAS, and using dynamic memory- efficient batching. Open-source LLMs, including LLaMA, Falcon, and Mistral were experimented on with standard NLP datasets. The measures of evaluation were accuracy, perplexity, F1/ROUGE measures, throughput, memory consumption and power. Empirical evidence shows that CPU-optimized LoRA can be trained with close to 9298 percent of the accuracy of GPU-LoRA and has considerable practical advantages: up to 7090 percent memory reductions and massive operational and energy savings. According to the study, the critical CPU bottlenecks are also discovered and dedicated optimizations are provided to enable CPU-based fine-tuning as a practical alternative to small institutions and researchers who lack access to GPUs. All in all, the results support the idea that CPU-optimized LoRA is a viable, economically effective, and scalable option in terms of customization of LLM, which can further the larger objective of making high-quality model adaptation more accessible and sustainable.

Keywords: LLMs, LoRA, Parameter- Efficient Fine-Tuning, CPU Optimization, Model Compression, Low- Rank Adaptation, AI Efficiency

How to cite this article:
Jahanvi Maheshwari, Naveen Hemrajani, Seema Sharma. A Reviewed Study On Cpu-Optimized Parameter-Efficient Fine- Tuning For Large Language Models To Increase Accuracy Using Lora. Recent Trends in Parallel Computing. 2026; 13(01):-.
How to cite this URL:
Jahanvi Maheshwari, Naveen Hemrajani, Seema Sharma. A Reviewed Study On Cpu-Optimized Parameter-Efficient Fine- Tuning For Large Language Models To Increase Accuracy Using Lora. Recent Trends in Parallel Computing. 2026; 13(01):-. Available from: https://journals.stmjournals.com/rtpc/article=2026/view=242308


References

  1. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877-901.
  2. Narayanan D, Shoeybi M, Casper J, LeGresley P, Patwary M, Korthikanti V, Vainbrand D, Kashinkunti P, Bernauer J, Catanzaro B, Phanishayee A. Efficient large-scale language model training on gpu clusters using megatron-lm. InProceedings of the international conference for high performance computing, networking, storage and analysis 2021 Nov 14 (pp. 1-15).
  3. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. Lora: Low-rank adaptation of large language models. Iclr. 2022 Apr 25;1(2):3.
  4. Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 conference on empirical methods in natural language processing 2021 Nov (pp. 3045-3059).
  5. Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems. 2023 Dec 15;36:10088-115.
  6. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Advances in neural information processing systems. 2017;30.
  7. Gao T, Fisch A, Chen D. Making pre-trained language models better few-shot learners. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 2021 Aug (pp. 3816-3830).
  8. Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. InInternational conference on machine learning 2019 May 24 (pp. 2790-2799). PMLR.

 

 

  1. Hasan MM, Islam MM. High-Performance Computing Architectures For Training Large-Scale Transformer Models In Cyber-Resilient Applications. ASRC Procedia: Global Perspectives in Science and Scholarship. 2022 Apr 29;2(1):193-226.
  2. Han Z, Gao C, Liu J, Zhang J, Zhang SQ. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608. 2024 Mar 21.
  3. Carneiro AR, Serpa MS, Navaux PO. Lightweight deep learning applications on AVX-512. In2021 IEEE Symposium on Computers and Communications (ISCC) 2021 Sep 5 (pp. 1-6). IEEE.
  4. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. 2023 Feb 27.
  5. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J. Transformers: State-of-the-art natural language processing. InProceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations 2020 Oct (pp. 38-45).

Ahead of Print Subscription Review Article
Volume 13
01
Received 11/02/2026
Accepted 13/03/2026
Published 30/04/2026
Publication Time 78 Days


Login


My IP

PlumX Metrics