GitHub is a web platform based on Git, a system for version control. Researchers and programmers can use it to maintain a complete history of their code, showing exactly what was done and when. This procedure is important for reproducibility because it allows you to return to any previous version or view how the code has evolved. Teams collaborate by proposing changes, reviewing each other’s code, and merging updates systematically. This approach enables multiple people to work on the same project without overwriting each other’s work. The platform also serves as a public archive where people can share and publish their final code.
Hugging Face is a platform that hosts and develops open-source tools for machine learning. It provides access to hundreds of thousands of pre-trained models and datasets, allowing researchers and developers to share, reproduce, and build upon existing work. By making models, datasets, and training code openly available, Hugging Face promotes transparency, collaboration, and reproducibility in AI research. The company has also developed widely used tools such as Transformers, Datasets, and Diffusers, which have become standard resources for building and fine-tuning machine learning models. Today, it serves as a central hub for AI development, hosting over 500,000 models across various domains, including text analysis, image recognition, and audio processing.
Kaggle is an online community and platform for data science and machine learning. It hosts a vast collection of datasets, public code notebooks, and structured competitions. These competitions challenge participants to solve real-world problems, from predicting housing prices to detecting diseases in medical images, by building and submitting predictive models. Kaggle’s primary role is to provide benchmarks for model performance and serve as a repository of data. Researchers and practitioners use it to find datasets, test methods against others, and review shared code notebooks.
Google Colab (Colaboratory) is a cloud-based environment for running Jupyter Notebooks in the browser. It enables users to write and execute Python code without local setup and provides access to computing resources such as CPUs, GPUs, and TPUs. This service allows researchers without expensive local hardware to run complex machine learning calculations. It integrates with Google Drive, making it easy to collaborate on code and data. The platform is widely used in research and education for developing, testing, and documenting AI methods in an accessible format.
Python is a widely used programming language valued for its readable syntax and extensive library ecosystem. Researchers and programmers across disciplines use it for data analysis, statistical modeling, machine learning, and workflow automation. Python´s popularity stems from its practical advantages: code written in Python tends to be easier to understand and maintain than alternatives, which is crucial when teams share methods or revisit projects years later. Python is also the dominant language in modern machine learning, providing frameworks such as TensorFlow and PyTorch that enable the development, training, and deployment of complex predictive and generative models.
The Hugging Face Open LLM Leaderboard is a public benchmark platform that ranks open-source large language models based on standardized evaluations across reasoning, knowledge, math, and other core capabilities. It helps researchers quickly identify which models perform best on tasks similar to their own, allowing them to shortlist strong candidates for analyzing unstructured data. Once a suitable model is identified, researchers can download or deploy it directly from Hugging Face, fine-tune it on their own datasets, and integrate it into custom pipelines for tasks such as text mining, document classification, coding qualitative data, or extracting insights from large corpora.
The RoBERTa-base model was trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here, and the original reference paper is TweetEval. This model is suitable for English.
The Whisper collection on Hugging Face is a curated set of open-source automatic speech recognition (ASR) and speech-translation models released by OpenAI. These multilingual models (ranging from tiny to large sizes) are pretrained on hundreds of thousands of hours of audio and are designed to “just work” out of the box for transcription and translation tasks. Researchers working with unstructured audio or multimodal datasets can use this collection by: (1) selecting an appropriate model checkpoint from the collection that fits their compute budget and language need; (2) deploying the model via the Transformers library to generate transcriptions or translations of their audio data; and (3) optionally fine-tuning or adapting the model on their own domain-specific recordings (e.g., conversational speech, field recordings) to improve accuracy and relevance to their research context.
Stable Baselines3 is one of the most user-friendly and reliable frameworks for running reinforcement learning experiments. It provides clean, well-tested implementations of popular RL algorithms, including PPO, DQN, A2C, and others, all accessible through a simple and consistent Python interface. Researchers can use SB3 to quickly prototype, train, and evaluate RL agents without needing deep knowledge of algorithmic internals. Its strong documentation, example scripts, and compatibility with OpenAI Gym environments make it easy to integrate into research workflows—whether for experimentation, method comparison, or building intelligent agents that interact with unstructured data.
Side note: Reinforcement learning is a complex and often unstable field of machine learning. While tools like Stable Baselines3 make experimentation easier, RL still requires solid methodological experience to design environments, tune algorithms, and interpret results. It is generally recommended for researchers with some background in machine learning or experimental modeling.
YOLOv11 is one of the latest iterations of the YOLO object-detection family, designed to deliver fast, accurate, and efficient performance right out of the box. It improves on earlier YOLO versions with stronger backbones, better feature extraction, and more robust performance across diverse lighting, motion, and scene conditions. For researchers, YOLOv11 offers an excellent plug-and-play baseline for object detection, segmentation, and tracking—especially when real-time speed or deployment on edge devices is important. While it may not represent the absolute state of the art in every research scenario, it provides a highly practical, well-supported starting point for analyzing unstructured visual data.
DETR (Detection Transformer) is a modern object-detection architecture that replaces traditional hand-crafted components with a pure transformer-based design. Instead of using anchors or region proposals, DETR formulates detection as a set prediction problem, making the entire pipeline end-to-end trainable. For researchers, this offers a cleaner, more flexible foundation for experimentation, especially when working with complex scenes, small objects, or tasks that benefit from global attention. While DETR is typically slower to train than YOLO, it often delivers stronger performance in challenging settings and is an excellent choice for advancing or benchmarking state-of-the-art computer-vision research.
The Improved Aesthetic Predictor is a machine-learning model designed to estimate the aesthetic quality of images. Developed as part of the LAION project, it builds on large-scale training from the LAION-5B dataset, one of the largest open image–text collections available. The model is trained on human aesthetic ratings, enabling it to learn visual patterns associated with beauty, composition quality, color harmony, and overall image appeal. It uses image data as input and processes it through a neural network fine-tuned specifically for aesthetic scoring. The final output is a continuous aesthetic score, typically ranging from low (less visually appealing) to high (high-quality or artistically compelling). This makes it useful for tasks like dataset curation, image ranking, generative model evaluation, and filtering outputs in workflows that require visually appealing images.
CLIP (2021), developed by OpenAI, is a multimodal neural network that learns to connect images with natural language descriptions. Instead of training on labeled datasets, CLIP is trained on 400 million image–text pairs collected from the internet, enabling it to understand visual content through its alignment with text. The model uses two separate encoders, a Vision Transformer or CNN for images and a Transformer-based text encoder, to map both modalities into a shared embedding space. The training objective is contrastive: matching images and captions are pulled closer together, while mismatched pairs are pushed apart. This allows CLIP to perform zero-shot image classification, text-based image retrieval, and image similarity scoring without requiring task-specific fine-tuning. The input consists of images and corresponding text prompts, and the output is a set of similarity scores that indicate how well an image aligns with a given textual description.
FairFace is a face-attribute prediction model trained on a large, demographically balanced dataset designed to reduce racial and gender bias in computer vision systems. The model predicts three key facial attributes: race, age group, and gender. It uses aligned face images as input and outputs probability distributions over demographic categories. Race can be predicted using either a 4-class or 7-class setup, gender is classified as male or female, and age is assigned to predefined age bins ranging from early childhood to older adulthood. FairFace is widely used for demographic analysis, dataset auditing, and fairness research due to its balanced representation across ethnic groups.
Librosa is a Python library designed for analyzing and processing audio, with a strong focus on music and sound features. It provides tools to load, manipulate, and visualize audio data, making it an essential toolkit for music information retrieval and audio-based machine learning. With Librosa, users can easily perform tasks such as extracting features like spectrograms, MFCCs, and chroma representations, detecting tempo and beats, identifying pitches and onsets, or modifying sounds through time-stretching and pitch-shifting. It is built on top of scientific Python libraries such as NumPy, SciPy, and Matplotlib, which ensures compatibility and efficiency in data analysis workflows. In essence, Librosa enables researchers, musicians, and developers to explore the structure and patterns of sound, bridging the gap between audio signal processing and practical music analysis.
MusicGen is an open-source text-to-music generation model that creates music directly from text descriptions. It’s a single-stage auto-regressive Transformer trained using an Encodec tokenizer (32kHz audio, 4 codebooks sampled at 50 Hz), allowing it to produce high-quality, controllable audio outputs.There are first attempts at finetuning MusicGen, such as techniques like LoRA fine-tuning and DreamBooth-style customization, enabling users to specialize the model on specific genres, artists, or musical styles.
Weights & Biases is a hosted experiment-tracking and MLOps platform focused on visualization and collaboration. With a few lines of code, users can log metrics, hyperparameters, model checkpoints, and system information for every training run. Interactive dashboards make it easy to compare experiments, monitor long-running trainings, and share results with collaborators. It integrates with common ML frameworks such as PyTorch, TensorFlow, and scikit-learn, and is widely used in both research and industry to manage complex training workflows.
MLflow is an open-source platform that supports the full machine learning lifecycle: experiment tracking, model packaging, and model registry. It lets teams log parameters, metrics, artifacts, and code versions for each run in a central place, which makes experiments comparable and reproducible. Models can be stored in a registry and deployed in standardized formats (e.g., REST endpoints), helping projects move from exploratory notebooks to stable, production-ready ML workflows.
Streamlit is a Python framework for building interactive web apps for machine learning and data science with minimal effort. By adding a few Streamlit function calls to a Python script, researchers can turn models and analyses into web interfaces with sliders, text boxes, file uploads, and visualizations. This makes it easy to create small tools and demos that allow non-technical stakeholders to explore ML models on text, images, or tabular data—without writing any front-end code.
KI Campus is a learning platform for artificial intelligence, funded by the German Federal Ministry of Education and Research. Learners can access self-paced courses covering topics from introductory AI concepts to natural language processing, computer vision, machine learning, and data literacy. KI-Campus Courses range from beginner-friendly offerings that require no prerequisites to advanced technical training, including hands-on Python programming and practical implementation. Upon completion, participants earn certificates, with some course series qualifying for micro-degrees.
Khan Academy is a non-profit organization that offers foundational computing courses, including an introduction to computer science course that teaches Python programming through project-based learning. The curriculum includes units on data science themes, simulations, and game design, with students building portfolios that address real-world problems, such as recommendation engines and disease modeling. Their AP Computer Science Principles course covers programming, algorithms, and data analysis with over 600 practice questions and instructional videos. Khan Academy also offers comprehensive statistics and probability courses, covering topics such as hypothesis testing, descriptive statistics, probability distributions, and regression. All courses are free and self-paced.
FreeCodeCamp is a non-profit organization that offers free, self-paced certifications in data science and machine learning through project-based learning. Each certification requires approximately 300 hours of work and completion of five projects, with optional practice lessons. The Machine Learning with Python certification covers TensorFlow, neural networks, natural language processing, and reinforcement learning. Learners work through practical projects, including image classification, audio classification, and text classification using BERT. Courses teach Python libraries, such as NumPy, Pandas, and Matplotlib, through hands-on coding exercises with automated tests. FreeCodeCamp utilizes interactive notebooks and emphasizes learning through the development of real-world projects.
Harvard offers several opportunities for AI learning across different formats. Through its HarvardX platform, learners can access free, self-paced online courses such as CS50’s Introduction to Artificial Intelligence with Python and the Data Science Professional Certificate, which introduce machine learning and data analysis concepts using practical programming examples. For those seeking formal academic credentials, the Harvard Extension School provides graduate certificate programs in Artificial Intelligence and Data Science.
MIT OpenCourseWare provides free access to more than 2,500 courses across MIT’s undergraduate and graduate programs, including a wide range of offerings in artificial intelligence, machine learning, and data science. For example, the Introduction to Deep Learning course covers applications in computer vision, natural language processing, and biology, offering practical experience in building neural networks. Each course includes lecture videos, problem sets, programming assignments, and exams from actual MIT classes, all of which are freely available for self-paced learning and teaching without enrollment.
The Sound of AI YouTube channel, created by Valerio Velardo, is a resource-rich hub dedicated to the intersection of artificial intelligence and audio/music. It offers in-depth tutorials on audio signal processing and deep learning for music, guiding viewers from the fundamentals of sound to practical Python techniques for analyzing and generating music.
This project trains a neural network to recognize and classify handwritten digits from 0 to 9, a classic and widely studied task in machine learning. Known as the MNIST digit classification problem, it serves as a benchmark for evaluating image recognition models and neural network architectures. The model learns to detect visual patterns such as curves, edges, and loops that distinguish one digit from another. It uses image data, specifically the MNIST dataset, which contains 70,000 28×28 pixel grayscale images. It produces a class prediction, outputting the single digit (from 0 to 9) that it identifies in the input image.
This project trains an agent to play the Atari game Breakout using Deep Q-Learning, a method that combines reinforcement learning with deep neural networks. The agent learns through trial and error, receiving rewards for successful actions and using a neural network to approximate Q-values that estimate future rewards. The data used consists of preprocessed visual frames from the Atari game environment, specifically 84×84 grayscale images stacked in groups of four consecutive frames to capture motion information, processed through the OpenAI Gymnasium library. The outcome is a trained policy that maps visual game states to optimal actions (moving the paddle left, right, or staying still), with the agent learning to maximize cumulative rewards by successfully playing the game and achieving scores comparable to human performance after sufficient training frames.
The Titanic Survival Prediction is widely used to teach predictive modeling concepts. It uses passenger data from the 1912 shipwreck to predict whether individuals survived based on features such as age, gender, class, and fare. The task involves data cleaning, feature engineering, and training classification models like Logistic Regression, Random Forest, or XGBoost to forecast binary outcomes. The data consists of CSV files with passenger details, and the output consists of survival predictions (0 = did not survive, 1 = survived).
This paper shows how generative AI can be fine-tuned to create highly effective advertising images. The authors develop a workflow that trains Stable Diffusion on product visuals, competitive ads rated on AIDA mindset metrics, and images reflecting brand personality traits. The resulting model can generate ads that consistently outperform conventional banner ads in attention, interest, desire, and activation, and can also express targeted brand personalities such as ruggedness or luxury. Across several studies, the authors demonstrate that AI-generated ads rival or exceed human-produced ads, generalize to click-through behavior, and can be tailored to specific consumer segments at almost no marginal cost. The work highlights how marketers can incorporate consumer feedback directly into generative models to automate high-quality ad creation.
With the aim of leveraging vast user-generated data, the authors developed an advanced text-to-emotion converter. Specifically, they introduce NADE (Natural Affect DEtection), a novel text-to-emoji-to-emotion converter that first “emojifies” language and then converts these emojis into intensity measures of well-established, theory-grounded emotions. Using human raters and state-of-the-art converters as benchmarks, the authors establish the benefits of exploiting emojis, validate NADE, and demonstrate its use in several marketing applications using data from various social media platforms. Users can apply the proposed converter through an easy-to-use online app and programming packages for Python and R.
In this publication, the authors compare, based on 272 datasets, different models to predict text sentiment. With the second link, the authors provide a model (“SiEBERT”, prefix for “Sentiment in English”) to predict sentiment. It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). Consequently, it outperforms models trained on only one type of text (e.g., movie reviews from the popular SST-2 benchmark) when used on new data, as shown below.
Visual content matters in business and marketing research, and image classification is key to understanding which visuals drive outcomes. CNNs have enabled large-scale labeling of images but often capture only local features, missing meanings that depend on context. New transformer-based vision models (TVMs) and vision-language models (VLMs) such as GPT-5 and Phi-4 better represent images through linguistic and relational structure, allowing them to match or surpass CNN performance across diverse marketing image tasks without additional task-specific training. Still, they can fail unexpectedly on certain labels. The most reliable approach is an ensemble that combines both paradigms, which consistently reduces classification errors. In practice: VLMs work extremely well out of the box, but for high-stakes applications or unknown task domains, combining VLMs with TVMs offers the strongest and most stable accuracy.
This research demonstrates how unstructured video data can be analyzed at scale to generate behavioral insights. Using automated video analytics and a large multimodal model applied to nearly 200,000 video segments, the study identifies how different types of hand movements influence persuasion. Results show that gestures serving to illustrate spoken content make messages easier to understand, increase perceived competence, and enhance persuasive impact. The work highlights the value of combining large-scale automated video processing with controlled experiments to uncover how nonverbal behavior shapes consumer responses.