The core task of this thesis will revolve around the utilization of LLM methodologies combined with vector databases to find similarities in user story descriptions within large-scale agile software projects.
• Dataset Collection & Preprocessing to ensure uniformity, remove noise, and make it suitable for LLM.
• LLM Implementation, focusing on generating embeddings of the user stories which capture the semantic essence of each story.
• Vector Database Integration: Store these embeddings in a vector database, ensuring efficient querying capabilities. This setup will enable the fast retrieval of similar user stories based on their vector representations.
• Similarity Analysis: Design and implement a robust mechanism to query the vector database to identify similar user stories. This step will involve determining a suitable similarity threshold and optimizing for both accuracy and computational efficiency.
• Evaluation: Assess the accuracy, efficiency, and scalability of the implemented system. This will involve creating test sets, defining metrics for evaluation, and comparing results against other standard methodologies if available.
• Insights & Recommendations: Beyond mere similarity detection, the thesis should also offer insights into patterns of redundancy in user stories and make recommendations for optimizing user story creation and management in agile projects