Are We Overestimating AI's Potential in Logistics?
Fr(AI)ght Perspectives - A gentle entry to Machine Learning for decision makers in logistics
Have you ever wondered what AI means for your logistics business or daily transportation operations? Or perhaps, why a seemingly simple AI project failed? The initial failures of AI projects are increasingly overshadowing the general AI euphoria. AI projects present unique challenges to many companies as they differ significantly from traditional software development. To understand these challenges, a fundamental understanding of AI and Machine Learning (ML) is crucial. Despite the strong association most people have with AI, we'll prefer using the term 'machine learning'. This is because machine learning is more precisely defined, focuses on learning algorithms, and avoids philosophical debates about intelligence.
With this in mind, we aim to kick-off this series by explaining specific aspects and potential pitfalls when implementing ML projects. We will avoid delving into the details of any particular algorithm. Instead we will discuss relevant aspects in a general manner. It is not necessary to understand the math behind AI. However, it is often helpful to have a strong intuition when it comes to machine learning.
In this article, we want to show how AI fundamentally works using a very simple example. Let's imagine we want to predict transport prices. In reality, numerous factors like distance, weight, dimensions, type of transport, market situation, etc. are necessary for this. However, in this example, we will solely focus on distance.The rough relationship between the two variables is clear. The further the transport, the more expensive it will be. But how expensive will it be exactly?
To answer this, we plot our data. Each point, or truck, represents a single transport. The x-axis shows the distance and the y-axis the price. Through this representation, we immediately see that the price increases for longer transports.
Relationship between transport prices and distance
But how much exactly will the transport cost? The simplest approach to tackle this question would be to see if there is another transport with a similar distance in the data. If so, we could use its price as a prediction for our current transport. But does this method make sense?
What happens, for instance, if this ‘similar’ transport was exceptionally cheap? Our prediction would then be too low. Couldn't we therefore use the information from other points as well? But how?
This is where machine learning comes into play. A simple approach for this problem is linear regression. For this approach, we draw a straight line through our points. The line is positioned to best represent the cloud of points.
Relationship between transport prices and distance with linear model
We can now use this line to predict our price a bit better. For instance, if we want to predict the cost of a 500km transport, we refer to the value on the line at 500km. This method is much more robust than simply using the closest transport, it also incorporates information from points that are farther away.
What does all this have to do with machine learning or AI? The line that we have drawn can be considered as a ML model. It takes an input (the distance) and provides a prediction (the price). Thus, our line is a fully-fledged ML model. The biggest difference to real live ML models is the complexity. However, aside from that complexity, this very simple example serves as a good introduction to the fundamentals of ML models. Even though it's still a big step from this simple linear model to large language models like GPT or Gemini.
What are the take-aways from this example?
In practice, ML models are much more complex than our model. In the field of ML, complexity is one of the decisive moments. Nevertheless, even from this simple example, some insights can be gained that can help us in dealing with more complex models.
ML models are trained on data. This statement seems trivial but has far-reaching consequences. These consequences are often financial in nature because ML models require a lot of data of good quality.
Clean data is the foundation for effective machine learning. Machine learning is no magic. If data quality was already an issue before implementing ML, it will become an even bigger problem with ML. While most ML models can compensate for erroneous data to some extent, the fundamental rule is: the better the data, the better the model. Therefore, it is crucial to clean up the data before starting ML projects. This process can pose significant challenges, particularly for small and medium-sized businesses. As data-quality and data-quantity play a paramount role in ML projects we will discuss them in more detail in separated articles.ML models are an approximation (mostly) and do not perfectly represent data. In our example, not all points lie directly on the line. If we use this model to calculate the price for a 1000km transport using this model, it deviates from the observed values. The data is only approximated by the line, a characteristic nearly all ML algorithms share.
The ability to approximate is both a benefit and a drawback: The benefit is that ML models can abstract by, for example, ignoring outliers. The drawback arises when ML models 'ignore' relationships in the data that truly exist. In our example, this could be transports over short distances where the flat rate outweighs the per kilometer charge. Our linear model cannot accurately represent this more complex situation. It abstracts too much and sets the price for short transports too low. This is called underfitting. In such cases, the data scientist's intuition is vital for choosing the right models and achieving an appropriate level of abstraction.
Some points discussed in this article may seem trivial. However, when introducing ML projects into new business areas, it is advisable to envision the operation of ML models through simple images. Since this is easier said than done, we aim in this series to break down individual aspects of machine learning into simple images, facilitating strategic ML decision-making.
Martin Bastian
Data Scientist
If you have questions or suggestions for new topics that we should cover in this series please reach out to us or comment below.