The following experiment tries to show that using the Iterative Hard Thresholding (IHT) algorithm does not lead to a trained neural network that has structured sparsity. Structured sparsity means that we have columns or rows that are completely zero.
To illustrate this, we use the Iris dataset, which contains 150 data points representing three different flower species. Each sample includes four numerical features: sepal length, sepal width, petal length, and petal width, as shown in the picture.
The three Iris species. Measurements are shown only for Iris Versicolor.
We exclude one sample from the dataset, which is:
\[
\mathbf{x} = [6.4, 3.2, 4.5, 1.5]^T.
\]
This sample is belong to versicolor flowers, i.e.,
\[
\text{The original sample class} = \text{2 (versicolor)}.
\]
Thus, we have 149 samples in total. If we split the data into 80% training and 20% testing (inference), we get approximately (120, 30).
We consider a one-layer neural network architecture, as shown below. This network has four inputs and three outputs. Therefore, we train a single-layer network using almost 120 data points.
A single-layer neural network with 4 inputs and 3 outputs.
We initialize the network by setting the sparsity level to 5, i.e., \(s=5\). The initial weights are the following:
The predicted class is 2 which matches the true class!
As you can see, neither a row nor a column is completely zero in the trained network, clearly verifying there is no pattern in the sparsity of the trained network.
This is much more clear if we look at the weights and biases on the trained network. As the picture shows, there is at least a connection between an input and output neuron.
The unstructured sparsity pattern after training the network using IHT algorithm.
On the other hand, if you look at the following trained network, it has 96.67% accuracy on the test set with two columns that are completely zero.