IMI Showtime - HTW Berlin

Organization

Github for code collaboration and distribution
Trello for task organization
Figma for visualization prototyping

Core

Python

Python was selected due to its status as the most prominent programming language for data science. Given the extensive range of available packages, it proved to be the most effective language for our problem.

PyMuPDF

The PyMuPDF library was employed for the extraction of data from PDF documents, as it is the most high-performance Python library for this purpose.

spaCy

spaCy is a high quality package for natural language processing tasks. We chose to work with spaCy because it provides many components without much configuration.

Python LLM

Python LLM provides a simple API to wrap around the many existing large language models.

Web View

HTML
JavaScript
CSS

M4 Master OrgXtract

Team

Supervision

Organization

Core

Web View

Project Architecture