Information Retrieval and NLP
Master of Science in Data Science and Artificial Intelligence
Winter Semester 2024/2025
Information retrieval focuses on accessing and extracting information from various document collections. This course explores key concepts and methods in information retrieval, with an emphasis on practical applications in web-based environments.
Given that text-based information retrieval often relies on natural language processing (NLP), the course also covers essential NLP techniques. Students will gain hands-on experience by building NLP pipelines and developing custom-trained machine learning models for real-world information retrieval tasks
Learning Outcomes
- Information Retrieval system architectures
- Implementing NLP pipelines
- Implementing recommender systems
Topics
- Introduction. Boolean retrieval model.
- The terms vocabulary.
- Term weighting (tf-idf), the vector space model.
- Computing scores in a complete search system.
- Evaluation in Information Retrieval.
- NLP pipelines.
- Generating NLP training corpora.
- Matrix decomposition and latent semantic indexing.
- Recommender systems.
- Creating users profiles from click behaviour data.
References
Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39, pp. 234-265). Cambridge: Cambridge University Press.
Hoxha, K., & Baxhaku, A. (2018). An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition. CYBERNETICS AND INFORMATION TECHNOLOGIES, 18(1).