AI and Machine Learning

The Time Oracle: Decoding Time Series Mysteries with Transformers

This article explores one aspect of Transformers, the Autoformer and its impact on time series forecasting. We will examine the unique features of Autoformers within Transformers, how they improve upon traditional methods for time series analysis, and their practical uses in various industries.

Natalia Pattarone
July 18, 2024

The term "Transformers" is well-known today Transformers have significantly broadened the potential for computational models by enhancing their ability to process data in parallel, thus improving efficiency and performance. In NLP, the success of Transformers has been most notable, leading to the development of Large Language Models (LLMs) such as the ChatGPT, Claude, and many others. These models excel in understanding and generating human-like text, making them vital for applications ranging from interactive chatbots to advanced text analysis.

This article explores one aspect of Transformers, the Autoformer and its impact on time series forecasting. We will examine the unique features of Autoformers within Transformers, how they improve upon traditional methods for time series analysis, and their practical uses in various industries.

What is Time Series Forecasting?

Time series data is a sequence of data points collected or recorded at specific time intervals, for instance, monitoring your daily steps using a fitness tracker to analyze activity patterns over time. This type of data is common in various fields, such as finance, weather, healthcare, and retail, where tracking changes over time is crucial for analysis and decision-making.

Traditionally, time series forecasting has relied on models like ARIMA (AutoRegressive Integrated Moving Average) and Exponential Smoothing. ARIMA models use past values and the relationships between them to predict future points. Exponential Smoothing methods apply weighted averages of past observations, meaning they give more importance to recent data points while still considering older data. This helps to smooth out fluctuations and highlight trends more clearly.

But as usual, not all that glitters is gold and traditional methods face several challenges. They often struggle with handling large volumes of data and capturing complex patterns, especially when dealing with non-linear relationships and long-term dependencies. Plus, these models usually require significant manual tuning and domain expertise to achieve accurate forecasts, limiting their scalability and adaptability to diverse datasets.

Transformers

Unlike older models, such as Recurrent Neural Networks (RNNs), Transformers can process entire sequences of data at once, rather than piece by piece. This makes them incredibly fast and efficient, especially when dealing with large datasets.

What makes Transformers special is their self-attention mechanism. This allows the model to dynamically focus on the most relevant parts of the input data. Think of it like having a built-in highlighter that enables the model to focus on important details. This feature is crucial for capturing long-range dependencies and complex patterns, areas where traditional models often struggle.

Transformers X-Ray

Now, let us try to understand a bit more what makes them special by explaining its architecture and mechanisms.

Figure 1. Overall architecture of Transformers.

Encoder-Decoder Structure

Transformers use an encoder-decoder structure, where:

  • Encoder: This part takes in the input data and turns it into an internal format that the model can understand.
  • Decoder: This part takes the internal format from the encoder and turns it into the final output data.

In time series forecasting, we usually focus more on the encoder part. But it's good to know how both parts work together to make the magic happen.

Self-Attention Mechanism

That is it, the heart of it all: the self-attention mechanism. In simpler words, this amazing system helps the model decide which parts of the input data are most important. Let’s break it down:

  • Query (Q): Think of this as the part of the data we're currently interested in.
  • Key (K): These are all the other parts of the data that might be relevant.
  • Value (V): This is the actual information or content associated with each part of the data.

Here’s how it works: the model looks at the Query and compares it to all the Keys. This comparison helps the model figure out which Values (pieces of data) are the most important to focus on by calculating attention scores. In simple terms, it’s like reading a book and using a highlighter to mark the important sentences. 

Positional Encoding

One key difference with their predecessors, the RNNs, is that instead of processing each piece of information in a sequential manner it does so by using the entire sequence at once. So because of that, it loses the ability to know exactly which is the original order of the data points. This is where positional encoding comes in.

Positional encoding adds information about the order of the data points to the input. It uses patterns (sine and cosine functions) to give each position a unique code. This way, the model knows, for example when using text, which words come first, second and so on. And in relation to time series, it helps to know the natural temporal dependency of the measurements.

Multi-Head Attention

To capture different types of relationships in the data, Transformers use a technique called multi-head attention. It is like having multiple pairs of eyes, each looking at the data from a different angle. Each "eye" (or head) focuses on different parts of the data, for example, when analyzing a sentence, one head might focus on the relationships between subjects and verbs, while another might focus on the connections between adjectives and nouns. These multiple perspectives are then combined to form a comprehensive view, allowing the model to understand complex patterns and relationships much better than looking at it from just one angle.

Feed-Forward Networks

After the multi-head attention has done its job, the data moves through a feed-forward network. This unit further refines the data by passing it through two layers of transformation, with a ReLU activation in between that adds a bit of flexibility and complexity. This extra processing helps the Transformer to make more accurate and detailed predictions.

Layer Normalization and Residual Connections

To make the training process smoother and faster, Transformers use two key techniques: layer normalization and residual connections.

  • Layer Normalization: This adjusts the data within each layer so that it has a consistent scale and distribution, which helps the model learn more effectively.
  • Residual Connections: These act like shortcuts, adding the input of a layer to its output. This helps prevent problems during training, such as the vanishing gradient problem, and makes it easier for the model to learn complex patterns without getting stuck.

Introducing Autoformers: Tailoring Transformers for Time Series

Autoformers are a specialized evolution of transformers designed specifically for long-term time series forecasting. Below is a small breakout of its key pieces and architecture so we can understand what makes them stand out. For further details, you can access the original paper here.

Figure 2. Overall architecture of Autoformer.

Series Decomposition Block

Autoformer integrates series decomposition directly into its architecture. This block separates the input data into trend and seasonal components, allowing the model to focus on these distinct patterns separately.

   - Trend Component: Represents the long-term progression in the data.

   - Seasonal Component: Captures repeating patterns and fluctuations.

   This method is distinctly different from traditional transformers, which typically process data without such decomposition, as other types of data, like text or images, do not inherently possess these attributes.

Auto-Correlation Mechanism

Instead of relying on the self-attention mechanism used in traditional Transformers, Autoformer employs an Auto-Correlation mechanism. This technique takes advantage of the periodic nature of time series data to find dependencies and aggregate information more efficiently. It does so by identifying and aggregating similar sub-series based on their periodic nature, significantly reducing computational complexity to logarithmic O(L log L). This is a major improvement over the exponential complexity Transformers have, because logarithmic growth means the computational resources required increase much more slowly as the size of the data grows. In practical terms, this means Autoformer can handle larger datasets more efficiently, making it faster and more scalable.

Figure 3. Auto-Correlation mechanism.

Efficient Encoding and Decoding

The encoder in Autoformer is designed to model the seasonal part of the data, eliminating the long-term trend during processing. Meanwhile, the decoder progressively refines the trend predictions, using information from the encoder to improve accuracy.

Practical Application: Using Autoformer with Hugging Face

To see Autoformer in action, we'll use the Hugging Face library to implement it for a real-world scenario. The example uses a pre-trained model based on the dataset from the paper The tourism forecasting competition, that can be found in the Monash Time Series Forecast Repository on Hugging Face. You can also access the original GitHub repository from the paper here

  1. Installation and Setup

I always recommend using a virtual environment like conda to setup all the necessary libraries for your project. This example uses python 3.10.12.

pip install torch
pip install transformers
  1. Import all necessary libraries
import torch
import seaborn as sns
import matplotlib.pyplot as plt

from huggingface_hub import hf_hub_download
from transformers import AutoformerForPrediction

sns.set_style('darkgrid')
  1. Download the dataset, load the model and perform an inference (prediction)
#Download and load the dataset
file = hf_hub_download(
    repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
)
batch = torch.load(file)

# Load the pre-trained Autoformer model
model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly")

# During training, one provides both past and future values
# as well as possible additional features
outputs = model(
    past_values=batch["past_values"],
    past_time_features=batch["past_time_features"],
    past_observed_mask=batch["past_observed_mask"],
    static_categorical_features=batch["static_categorical_features"],
    future_values=batch["future_values"],
    future_time_features=batch["future_time_features"],
)

loss = outputs.loss
loss.backward()

# During inference, one only provides past values
# as well as possible additional features
# the model autoregressively generates future values
outputs = model.generate(
    past_values=batch["past_values"],
    past_time_features=batch["past_time_features"],
    past_observed_mask=batch["past_observed_mask"],
    static_categorical_features=batch["static_categorical_features"],
    future_time_features=batch["future_time_features"],
)

mean_prediction = outputs.sequences.mean(dim=1)
  1. Visualize your results!

The most exciting part is to check how the model actually works and that is, by matching the ground truth (the real future values) against the ones predicted by the model using only the past data.

# Plotting
past_values = batch["past_values"].squeeze().numpy()
future_values = batch["future_values"].squeeze().numpy()
predicted_values = mean_prediction.squeeze().detach().numpy()

# Single sample for simplicity
past_values = past_values[0]
future_values = future_values[0]
predicted_values = predicted_values[0]

plt.figure(figsize=(12, 6))
plt.plot(range(len(past_values)), past_values, label='Past Values')
plt.plot(range(len(past_values), len(past_values) + len(future_values)), future_values, label='True Future Values')
plt.plot(range(len(past_values), len(past_values) + len(predicted_values)), predicted_values, label='Predicted Future Values', linestyle='dashed')
plt.legend()
plt.xlabel('Time')
plt.ylabel('Values')
plt.title('Original Time Series and Predictions')
plt.show()

And here is the resulting plot. As you can see, the model did a pretty good job by forecasting the amount of tourism expected in the future. Of course, minor discrepancies are expected and usually the more acute or sudden the fluctuations are, the more difficult it is for the model to generalize well and predict with perfect accuracy. Nevertheless, we should be really proud of our model!

Figure 4: Original Time Series and Predictions

Final Words

Transformers have truly revolutionized the field of machine learning, and their impact on time series forecasting is no exception. By introducing innovations like the Autoformer model, we can now handle long-term dependencies and complex patterns in data with unprecedented efficiency and accuracy. Autoformer, with its unique and innovative mechanisms, provides a significant leap forward from traditional transformers. Its ability to manage the inherent periodicity of time series data, along with the reduced computational complexity, makes it an invaluable tool for various applications—from predicting stock prices to forecasting weather patterns and beyond.

The practical example using Hugging Face demonstrated how easily we can implement and visualize the power of Autoformer in real-world scenarios. Through the application of these innovative models, we can make more informed decisions and strategic plans, by going above and beyond what's currently possible in data analysis.

In conclusion, my dear reader, the Autoformer stands as a testament to the ongoing innovation in the field, showing us that with the right tools, we can predict the future more accurately than ever before. So, as you dive into your next time series project, remember the power of these models. Keep experimenting, stay curious, and embrace the advancements that are reshaping the landscape of data forecasting. The future never looked brighter. Happy coding!

We are Azumo
and we get it

We understand the struggle of finding the right software development team to build your service or solution.

Since our founding in 2016 we have heard countless horror stories of the vanishing developer, the never-ending late night conference calls with the offshore dev team, and the mounting frustration of dealing with buggy code, missed deadlines and poor communication. We built Azumo to solve those problems and offer you more. We deliver well trained, senior developers, excited to work, communicate and build software together that will advance your business.

Want to see how we can deliver for you?

schedule my call

Benefits You Can Expect

Release software features faster and maintain apps with Azumo. Our developers are not freelancers and we are not a marketplace. We take pride in our work and seat dedicated Azumo engineers with you who take ownership of the project and create valuable solutions for you.

Industry Experts

Businesses across industries trust Azumo. Our expertise spans industries from healthcare, finance, retail, e-commerce, media, education, manufacturing and more.

Illustration of globe for technology nearshore software development outsourcing

Real-Time Collaboration

Enjoy seamless collaboration with our time zone-aligned developers. Collaborate, brainstorm, and share feedback easily during your working hours.

vCTO Solution Illustration

Boost Velocity

Increase your development speed. Scale your team up or down as you need with confidence, so you can meet deadlines and market demand without compromise.

Illustration of bullseye for technology nearshore software development outsourcing

Agile Approach

We adhere to strict project management principles that guarantee outstanding software development results.

Quality Code

Benefits from our commitment to quality. Our developers receive continuous training, so they can deliver top-notch code.

Flexible Models

Our engagement models allow you to tailor our services to your budget, so you get the most value for your investment.

Client Testimonials

Zynga

Azumo has been great to work with. Their team has impressed us with their professionalism and capacity. We have a mature and sophisticated tech stack, and they were able to jump in and rapidly make valuable contributions.

Zynga
Drew Heidgerken
Director of Engineering
Zaplabs

We worked with Azumo to help us staff up our custom software platform redevelopment efforts and they delivered everything we needed.

Zaplabs
James Wilson
President
Discovery Channel

The work was highly complicated and required a lot of planning, engineering, and customization. Their development knowledge is impressive.

Discovery Channel
Costa Constantinou
Senior Product Manager
Twitter

Azumo helped my team with the rapid development of a standalone app at Twitter and were incredibly thorough and detail oriented, resulting in a very solid product.

Twitter
Seth Harris
Senior Program Manager
Wine Enthusiast

Azumo's staff augmentation service has greatly expanded our digital custom publishing capabilities. Projects as diverse as Skills for Amazon Alexa to database-driven mobile apps are handled quickly, professionally and error free.

Wine Enthusiast Magazine
Greg Remillard
Executive Director
Zemax

So much of a successful Cloud development project is the listening. The Azumo team listens. They clearly understood the request and quickly provided solid answers.

Zemax
Matt Sutton
Head of Product

How it Works

schedule my call

Step 1: Schedule your call

Find a time convenient for you to discuss your needs and goals

Step 2: We review the details

We estimate the effort, design the team, and propose a solution for you to collaborate.

Step 3: Design, Build, Launch, Maintain

Seamlessly partner with us to confidently build software nearshore

We Deliver Every Sprint

Icon illustrating the advantage of time zone-aligned software developers from Azumo, ensuring work hours synchronized with client schedules.

Time Zone Aligned

Our nearshore developers collaborate with you throughout your working day.

Icon showcasing the advantage of hiring expert engineers from Azumo for software development services.

Experienced Engineers

We hire mid-career software development professionals and invest in them.

Icon symbolizing how Azumo's software developers prioritize honest, English-always communication for building quality software.

Transparent Communication

Good software is built on top of honest, english-always communication.

Icon representing how Azumo's developers enhance velocity by approaching software development with a problem solver's mindset.

We Build Like Owners

We boost velocity by taking a problem solvers approach to software development.

Icon illustrating how Azumo's quality assurance process ensures the delivery of reliable, working code for every project.

You Get Consistent Results

Our internal quality assurance process ensures we push good working code.

Icon depicting how Azumo follows strict project management principles to stay aligned with your goals throughout the development process.

Agile Project Management

We follow strict project management principles so we remain aligned to your goals