Data Science – The Ai Vanguard

Skip to content Skip to sidebar Skip to footer

How to Set the Number of Trees in Random Forest

Data ScienceJune 11, 20257Views 0Likes 0Comments

Scientific publication T. M. Lange, M. Gültas, A. O. Schmitt & F. Heinrich (2025). optRF: Optimising random forest stability by determining the optimal number of trees. BMC bioinformatics, 26(1), 95. Follow this LINK to the original publication. Random Forest — A Powerful Tool for Anyone Working With Data What is Random Forest? Have you ever wished you…

Survival Analysis When No One Dies: A Value-Based Approach

Data ScienceJune 11, 202512Views 0Likes 0Comments

Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in this area is the Kaplan-Meier estimator. Born in the…

The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help

Data ScienceJune 11, 202510Views 0Likes 0Comments

Automl has become the gateway drug to machine learning for many organizations. It promises exactly what teams under pressure want to hear: you bring the data, and we’ll handle the modeling. There are no pipelines to manage, no hyperparameters to tune, and no need to learn scikit-learn or TensorFlow; just click, drag, and deploy. At…

From a Point to L∞

Data ScienceJune 11, 202518Views 0Likes 0Comments

Why you should read this As someone who did a Bachelors in Mathematics I was first introduced to L¹ and L² as a measure of Distance… now it seems to be a measure of error — where have we gone wrong? But jokes aside, there seems to be this misconception that L₁ and L₂ serve the same function — and…

NumExpr: The “Faster than Numpy” Library Most Data Scientists Have Never Used

Data ScienceJune 11, 202520Views 0Likes 0Comments

Browsing GitHub the other day, I came across a library I’d never heard of before. It was called NumExpr. I was immediately interested because of some claims made about the library. In particular, it stated that for some complex numerical calculations, it was up to 15 times faster than NumPy. I was intrigued because, up…

Why Most Cyber Risk Models Fail Before They Begin

Data ScienceJune 11, 202516Views 0Likes 0Comments

Cybersecurity leaders are being asked impossible questions. “What’s the likelihood of a breach this year?” “How much would it cost?” And “how much should we spend to stop it?” Yet most risk models used today are still built on guesswork, gut instinct, and colorful heatmaps, not data. In fact, PwC’s 2025 Global Digital Trust…

Load-Testing LLMs Using LLMPerf

Data ScienceJune 11, 202520Views 0Likes 0Comments

Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level is the practice of…

Sesame Speech Model: How This Viral AI Model Generates Human-Like Speech

Data ScienceJune 11, 202521Views 0Likes 0Comments

Recently, Sesame AI published a demo of their latest Speech-to-Speech model. A conversational AI agent who is really good at speaking, they provide relevant answers, they speak with expressions, and honestly, they are just very fun and interactive to play with. Note that a technical paper is not out yet, but they do have a…

A Data Scientist’s Guide to Docker Containers

Data ScienceJune 11, 202523Views 0Likes 0Comments

For a ML model to be useful it needs to run somewhere. This somewhere is most likely not your local machine. A not-so-good model that runs in a production environment is better than a perfect model that never leaves your local machine. However, the production machine is usually different from the one you developed the…

Linear Programming: Managing Multiple Targets with Goal Programming

Data ScienceJune 11, 202522Views 0Likes 0Comments

This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear programming setup that can handle the optimization of…