CONNECT WITH US

'Alpha Three' team optimizes data pre-processing to significantly improve AI assistant question-answering accuracy

News highlights 0

Generative AI (GenAI) is swiftly revolutionizing corporate operations, product development, business models, and the overall ecosystem. According to a survey report published by Taiwan's Market Intelligence & Consulting Institute (MIC), in 2024, 19% of Taiwan's five major industries utilized GenAI or engaged in related activities, with the finance and insurance sector representing 25% and the manufacturing sector following at 22%. Amid the proliferation of Generative AI for developing AI assistants, some firms have found that their substantial investments in these assistants did not yield the expected results, leading them to terminate their AI projects and thus diminishing their overall competitiveness.

Alpha Three wins with a smarter AI assistant—boosting accuracy through improved data chunking for enterprise knowledge Q&A. Credit: Company

Alpha Three wins with a smarter AI assistant—boosting accuracy through improved data chunking for enterprise knowledge Q&A. Credit: Company

The primary cause of the poor performance of AI assistants, according to "Alpha Three," the winning team in the "2025 AI Wave: Taiwan Generative AI Applications Hackathon" from Walsin Lihwa's "Smart Manufacturing" group, is the excessively small data chunking during pre-processing. This can easily disrupt the original document paragraph context, resulting in the AI model's misunderstanding deviations and a response content that is not sufficiently accurate.

The team recommended that the "amount of text in a single PDF page" be used as the unit of chunk in order to preserve the natural paragraph structure and comprehensive context, as well as to prevent semantic discontinuity. The review committee unanimously recognized this method for successfully achieving three substantial advantages: "optimizing user query experience," "reducing the risk of hallucinations," and "enhancing semantic coherence and search and answer accuracy."

Pre-competition training proves valuable; effectively utilizing AI tools to realize creativity

"Alpha Three" utilized a steel standard inquiry as a test case and posed the question, "Does ASTM A276 steel grade 316Ti comply with the EN 10088-3 standard?" The AI system retrieved comprehensive information covering the chemical composition and standard specifications of steel grades. The content of the response is highly focused and accurately reflects the primary data. The AI system demonstrated extraordinary reliability in the application of enterprise knowledge by achieving a perfect score (1.0 out of 1.0) across the three metrics of "search relevance," "answer solidity," and "answer relevance."

To achieve these results, the team utilized Amazon Web Services (AWS) to develop a comprehensive enterprise knowledge question-answering framework.

PDF, PNG, JPG, and other file formats are uploaded to Amazon S3, the cloud object storage service, during the initial phase. The second phase is providing quick query services with a comprehensive language model and the Flask API. In the third phase, the team leverages Amazon Bedrock, a fully managed service that makes high-performing foundation models, to connect extensive language models, thereby improving scalability and reaction speed. Amazon Elastic Compute Cloud (Amazon EC2) is employed in the fourth phase to expedite API processing, thereby guaranteeing system stability and efficiency.

The extensive system design includes data uploading, management, retrieval, and response, allowing users to easily submit inquiries and obtain prompt professional responses, which became a crucial factor in their victory.

The "Alpha Three" team, comprised of recent information engineering graduates from National Taiwan University in 2024, observed that, despite their degrees in information-related fields, they were completely unfamiliar with contemporary mainstream AI tools in the face of the rapid advancement of Generative AI technology. The project was successfully completed within 30 hours, and the award was secured, thanks to the professional training provided by the organizer, which included a series of enterprise data workshops and AWS Generative AI workshops, as well as Walsin Lihwa's explanation of the steel standards.

Article edited by Sherri Wang