Deep Tabular Augmentation for Regression Tasks

Lasse Schmidt
3 min readApr 23, 2022
Photo by Tim Mossholder on Unsplash

So far I did show how to use Deep Learning for Data Augmentation with the idea in mind to create data for an underrepresented class in your dataset (here, here and here). However, the Deep Tabular Augmenter can be used for more than just that. In this blogpost I want to show you how you can use the technique to create data for regression tasks.

First, we need a dataset. Here, I use the infamous Boston Housing Dataset. After loading the data, it looks like this:

When looking at the data, we see that we have only 506 entries. This makes it a good example for showing how the Deep Tabular Augmenter works in cases like this. Unlike my other blogposts about the Deep Tabular Augmenter, we now want to use it to create data for the complete dataset. Usually, people use this dataset for predicting the MEDV, so the Median value of owner-occupied homes in $1000’s. Let’s first create train and test datasets:

Then, we’ll put the data into dataloaders, so our Deep Tabular Augmenter can do calculations on it. This time, we use a slightly different dataloader-function: dta.create_datasets_no_target_var. This is a convenience function for when we do not have a target-variable. Usually, for example in the fraud-detection case, we have a specific target vaiable at hand, for which we do not want to create fake data. This here is different, as we also want to create fake data for our (in later tasks used) MEDV variable.

Also, make sure to have at least version 0.5.0 of deep_tabular_augmentation, otherwise you won’t have this function. If not, just go:

pip install deep_tabular_augmentation — upgrade

From now on, everything should be very familiar. Let’s start with the basic architecture of our model:

Put everything data related into the learner class:

Set up a Learning Rate schema of your choice and train the model:

Let’s have a look at the created data:

As RAD and CHAS are not continuous, let’s just round it to the nearest number.

Now that we have that, let’s come to the evaluation part. Training is all fun and good, but we want to see some results. So let’s plot it:

I think that looks pretty awesome! So remember, whenever you have just not enough data for your model to train properly, you can try the Deep Tabular Augmenter to create enough fake data, so your model is actually able to learn something. I hope this helps and stay tuned for more!

If you have any questions or want anything added to the package, just ask me.

Lasse

Originally published at https://lschmiddey.github.io on April 23, 2022.

--

--