Harmonization of Discrepant Data: A Solution to the Computational Models for Data Collection in the Tertiary Institutions

Main Article Content

Aburuotu, E. C.
Nathaniel A. O.

Abstract

Data visualization, interoperability, analysis, and business decisions face a significant challenge as a result of the growing volumes of heterogeneous data being produced by agencies and institutions in the education sector. Stakeholders in the education sector should implement Harmonization of Heterogeneous Data-set as a critical solution. The legacy data that is analyzed and is intended to be used in decision support systems and analytical applications is imported from various data sources with different data types and database architectures and structures. All of these data need to be harmonized for the intended business solutions and growth. Therefore, the Support Vector Machine (SVM) algorithm for Heterogeneous Data Harmonization technique has emerged as the most effective method for creating high-quality data intended to enhance the governance and usefulness of its purpose across the enterprise. The goal of the research was to create and refine a support vector machine-based heterogeneous data harmonization solution for enterprise databases. A data harmonization technique was developed with the integration of harmonization tools using a support machine learning algorithm, and the work was implemented using the Java Script (JS) development environment. The study looked at existing data production techniques on various active databases. To accomplish its goals, this work used the Rapid Application Development (RAD) system methodology. The system's AI machines were tested and trained using both structured and unstructured data imported from Microsoft Excel applications, thanks to the Supervised Machine Learning procedures. There were 10,990 different data sets that were used for training and testing. Testing was conducted on 8,393 (70%) datasets, while 2,597 (30%) were used for training. The outcomes demonstrate that the system was successful in redefining the data headers and column dimensions as a means of coordinating the pull of data imported into the system.

Article Details

Section

Articles

How to Cite

Harmonization of Discrepant Data: A Solution to the Computational Models for Data Collection in the Tertiary Institutions. (2025). Innovative: International Multidisciplinary Journal of Applied Technology (2995-486X), 3(11), 90-106. https://doi.org/10.51699/1ndcxz69

References

S. Bagui and P. C. Dhar, "Positive and negative association rule mining in Hadoop’s MapReduce Environment," Journal of Big Data, vol. 8, pp. 10–15, 2019. doi: 10.1186/s40537-019-0238-8.

O. A. Bamiro, Enhancing the quality of leadership and governance of Nigerian universities towards sustainable management and optimal performance, Executive Development Programme for Council Members of Nigerian Universities, Abuja, 2016.

L. B. Becnel et al., "BRIDG: A domain information model for translational and clinical protocol-driven research," Journal of the American Medical Informatics Association, vol. 24, no. 5, pp. 882–890, 2017. doi: 10.1093/jamia/ocx004.

E. Bisong, Google Collaboratory, in Building Machine Learning and Deep Learning Models on Google Cloud Platform, Apress, Berkeley, 2019, pp. 1–7. doi: 10.1007/978-1-4842-4470-8_7.

J. H. Boyd, P. D. T., and S. S. Saint, "Technical challenges of providing record linkage services for research," BMC Med. Inform. Decision Making, vol. 23, 2014.

M. E. Boza, "SDG Dashboards: The role of information tools in the implementation of the 2030 Agenda," Report of research collaboration between UNDP-SIGOB and the UNDP Bangkok-Hub, 2017.

C. Bryan and J. K., "Visualization of Heterogeneous Data," ResearchGate, 2007. https://www.researchgate.net/publication/3411507_Visualization_of_Heterogeneous_Data.

T. Carneiro et al., "Performance analysis of Google Collaboratory as a tool for accelerating deep learning applications," IEEE Access, pp. 77–85, 2018.

B. K. Daniel, "Big Data and data science: A critical review of issues for educational research," Britain Journal of Education Technology, vol. 50, pp. 101–113, 2019.

K. Dahdouh, A. Dakkak, L. Oughdir, and F. Messaoudi, "Big data for online learning systems," Educational Information Technology, vol. 23, pp. 2783–2800, 2018.

L. Ding, Z. Fan, and D. Chen, "Auto-categorization of Harmonization System Code using background net approach," Procedia Computer Science, vol. 11, pp. 1462–1471, 2015.

D. Doiron, P. Burton, and Y. Marcon, "Data harmonization and federated analysis of population-based studies: The BioSHaRE project," Emerging Themes Epidemiology, vol. 10, no. 12, 2013. doi: 10.1186/1742-7622-10-12.

D. Doiron et al., "Facilitating collaborative research: Implementing a platform supporting data harmonization and pooling," Norsk Epidemiologip, vol. 21, pp. 221–224, 2012.

A. Dutta, T. Deb, and S. Pathak, "Automated Data Harmonization (ADH) using Artificial Intelligence (AI)," OPSEARCH, 2020. doi: 10.1007/s12597-020-00467-4.

E. F., M. S., L. S., and Y. Z., "Semantic Web Enabled Software Engineering," 8th International Semantic Web Conference, Virginia, USA, 2009, pp. 25–29.

E. O., S. E., V. E., and E. O., "Web mining: Cybermetrics analysis of the nine (9) newly established federal universities in Nigeria in 2011," International Advanced Research in Computer Science and Software Engineering, vol. 5, pp. 904–913, 2015.

T. Fera and W. A., "Next IT Challenge: From Data Acquisition to Harmonized Information Management," Journal of AHIMA, pp. 42–44, 2010.

H. Gatner, "Data Modelling – Understanding Tools and Techniques Involved," Gartner-Global Research and Advisory Firm, https://www.xenonstack.com.

S. Gomatam, A. F. Karr, J. P. Reiter, and A. T. Sanil, "Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access analysis servers," Journal of Statistical Science, vol. 50, pp. 163–177, 2005.

B. E. Haarbrandt, M. Tute, and S. Marschollek, "Automated population of an i2b2 clinical data warehouse from an open EHR-based data repository," Journal of Biomedical Information, vol. 59, pp. 277–281, 2016.

S. Hadi and R. B., "Emerging artificial intelligence methods in structural engineering," Journal of Science Direct- Engineering Structures, vol. 12, pp. 170–189, 2018.

J. Han and M. Kamber, Data Mining: Concepts and Techniques, Burlington: Morgan Kaufmann Publishers, 2006, pp. 1–42.

"Health Information Exchange for Continuity of Maternal and Neonatal Care Supporting," Journal of Applied Clinical Information, vol. 8, pp. 1082–1094, 2017.

K. Hines, "Facebook reporting tools for in-depth analysis of fan pages," Postplanner, https://www.postplanner.com/6-facebook-reporting-tools-in-depth-analysis.

I. M., "IFRS application and the comparability of financial statements," Journal of Account Finance, vol. 7–8, 2017.

"Information Science," "New Aspects on Using Artificial Intelligence to Shape the Future of Entrepreneurs," http://dxdoi.org/10.18576/isl/090106.

W. H. Inmon and R. Hackathorn, Using the Data Warehouse, New York: Wiley, 1994.

ISO/IEC 11179, Metadata Registries - Part 3 (Edition 3), http://metadata-stds.org/11179/index.html#A3.