Deep Linear Probe Generators for Weight Space Learning

Abstract

Weight space learning aims to extract information about a neural network, such as its training dataset or generalization error. Recent approaches learn directly from model weights, but this presents many challenges as weights are high-dimensional and include permutation symmetries between neurons. An alternative approach, Probing, represents a model by passing a set of learned inputs (probes) through the model, and training a predictor on top of the corresponding outputs. Although probing is typically not used as a stand alone approach, our preliminary experi- ment found that a vanilla probing baseline worked surprisingly well. However, we discover that current probe learning strategies are ineffective. We therefore propose Deep Linear Probe Generators (ProbeGen), a simple and effective mod- ification to probing approaches. ProbeGen adds a shared generator module with a deep linear architecture, providing an inductive bias towards structured probes thus reducing overfitting. While simple, ProbeGen performs significantly better than the state-of-the-art and is very efficient, requiring between 30 to 1000 times fewer FLOPs than other top approaches.

Our hypothesis is that probing methods, when done right, hold significant potential. Drawing inspiration from binary code analysis, where dynamic approaches are more common than static ones, we believe that running neural networks, i.e., probing, is a promising approach for weight space learning. We begin with 2 preliminary experiments to test the quality and potential of probing approaches:

Comparing a vanilla probing baseline to previous graph based and mechanistic approaches. With enough probes: (a) vanilla probing performs better than graph approaches that does not use probing. (b) Graph approaches become equivalent to probing only when they also use probing features.
Comparing learned probes and probes from randomly selected data. We show that synthetic probes are equally effective as latent optimized ones.

We propose Deep Linear Probe Generators (ProbeGen) for learning better probes. ProbeGen optimizes a deep generator module limited to linear expressivity, that shares information between the different probes. It then observes the responses from all probes, and trains an MLP classifier on them. While simple, we demonstrate it greatly enhances probing methods, and also outperforms other approaches by a large margin.

ProbeGen represents each model as an ordered list of output values based on carefully chosen probes. These representations often have semantic meanings as the output space of the model (here, image pixels or logits) are semantic by design.

MNIST INR Representations. ProbeGen chooses object centric locations as suitable for this task, while Vanilla Probing chooses locations scattered around the image, including pixels far out of the image.

CIFAR10 Wild Park Representation visualization

CIFAR10 Wild Park Representations. The values become more uniform as the accuracy of the models decreases, and sharper as it increases. This suggests that ProbeGen uses some form of prediction entropy in its classifier. We validate this by training a classifier that only takes the entropy of each probe as its features, which already reaches a Kendall’s τ of 0.877.

Comparing the probes learned from different algorithms

ProbeGen vs. Vanilla Probing Learned Probes. Although both not interpetable by humans, it is clear that ProbeGen probes have much more structure than latent-optimized ones.

BibTeX

@misc{kahana2024deeplinearprobegenerators,
      title={Deep Linear Probe Generators for Weight Space Learning},
      author={Jonathan Kahana and Eliahu Horwitz and Imri Shuval and Yedid Hoshen},
      year={2024},
      eprint={2410.10811},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.10811},
}

Deep Linear Probe Generators for Weight Space Learning

Abstract

Vanilla Probing vs. Other Approaches. Comparing a vanilla probing approach with previous graph based and mechanistic approaches (numbers of probes in brackets).

Latent Optimized Probes vs. Synthetic Data as Probes. Comparing learned probes and probes from randomly selected data.

MNIST INR Representations. ProbeGen chooses object centric locations as suitable for this task, while Vanilla Probing chooses locations scattered around the image, including pixels far out of the image.

ProbeGen vs. Vanilla Probing Learned Probes. Although both not interpetable by humans, it is clear that ProbeGen probes have much more structure than latent-optimized ones.

BibTeX