Deep Linear Probe Generators for Weight Space Learning

Deep Linear Probe Generators for Weight Space Learning

Hebrew University of Jerusalem
An overview of our method

We optimize a deep linear probe generator to create suitable probes for the model. Meaning, our generator includes no activations between its linear layers, yet the addition of linear layers reinforces a desired structure for the probes. We then gather the models responses over all probes, and train a classifier to predict some attribute of interest about the model.

Abstract

Weight space learning aims to extract information about a neural network, such as its training dataset or generalization error. Recent approaches learn directly from model weights, but this presents many challenges as weights are high-dimensional and include permutation symmetries between neurons. An alternative approach, Probing, represents a model by passing a set of learned inputs (probes) through the model, and training a predictor on top of the corresponding outputs. Although probing is typically not used as a stand alone approach, our preliminary experi- ment found that a vanilla probing baseline worked surprisingly well. However, we discover that current probe learning strategies are ineffective. We therefore propose Deep Linear Probe Generators (ProbeGen), a simple and effective mod- ification to probing approaches. ProbeGen adds a shared generator module with a deep linear architecture, providing an inductive bias towards structured probes thus reducing overfitting. While simple, ProbeGen performs significantly better than the state-of-the-art and is very efficient, requiring between 30 to 1000 times fewer FLOPs than other top approaches.

Our hypothesis is that probing methods, when done right, hold significant potential. Drawing inspiration from binary code analysis, where dynamic approaches are more common than static ones, we believe that running neural networks, i.e., probing, is a promising approach for weight space learning. We begin with 2 preliminary experiments to test the quality and potential of probing approaches:

  1. Comparing a vanilla probing baseline to previous graph based and mechanistic approaches. With enough probes: (a) vanilla probing performs better than graph approaches that does not use probing. (b) Graph approaches become equivalent to probing only when they also use probing features.
  2. Comparing learned probes and probes from randomly selected data. We show that synthetic probes are equally effective as latent optimized ones.

We propose Deep Linear Probe Generators (ProbeGen) for learning better probes. ProbeGen optimizes a deep generator module limited to linear expressivity, that shares information between the different probes. It then observes the responses from all probes, and trains an MLP classifier on them. While simple, we demonstrate it greatly enhances probing methods, and also outperforms other approaches by a large margin.

Main results of our method ProbeGen

ProbeGen represents each model as an ordered list of output values based on carefully chosen probes. These representations often have semantic meanings as the output space of the model (here, image pixels or logits) are semantic by design.

BibTeX

@misc{kahana2024deeplinearprobegenerators,
      title={Deep Linear Probe Generators for Weight Space Learning},
      author={Jonathan Kahana and Eliahu Horwitz and Imri Shuval and Yedid Hoshen},
      year={2024},
      eprint={2410.10811},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.10811},
}