Information Retrieval and NLP

Master of Science in Data Science and Artificial Intelligence 

Winter Semester 2024/2025

Information retrieval focuses on accessing and extracting information from various document collections. This course explores key concepts and methods in information retrieval, with an emphasis on practical applications in web-based environments.

Given that text-based information retrieval often relies on natural language processing (NLP), the course also covers essential NLP techniques. Students will gain hands-on experience by building NLP pipelines and developing custom-trained machine learning models for real-world information retrieval tasks

Learning Outcomes

  • Information Retrieval system architectures
  • Implementing NLP pipelines
  • Implementing recommender systems

Topics

  1.  Introduction. Boolean retrieval model.
  2.  The terms vocabulary.
  3.  Term weighting (tf-idf), the vector space model.
  4. Computing scores in a complete search system.
  5. Evaluation in Information Retrieval.
  6. NLP pipelines.
  7.  Generating NLP training corpora.
  8. Matrix decomposition and latent semantic indexing.
  9. Recommender systems.
  10. Creating users profiles from click behaviour data.

References

Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39, pp. 234-265). Cambridge: Cambridge University Press.

Hoxha, K., & Baxhaku, A. (2018). An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition. CYBERNETICS AND INFORMATION TECHNOLOGIES18(1).