bias and kernel params are put on different gpu devices #4116

YunxiTang · 2024-08-06T06:30:19Z

System information

OS Platform and Distribution: Linux Ubuntu 20.04
Flax, jax, jaxlib versions: flax -> 0.6.11, jax -> 0.4.13, jaxlib -> 0.4.13+cuda11.cudnn86
Python version: Python 3.9.19
GPU/TPU model and memory: GPU4090 with 24GB
CUDA version: cuda 11.8

Problem you have encountered:

When I try to initialize a Flax model on a specific gpu device (for example, gpu 1), the bias and kernel params are located on different gpu devices.

What you expected to happen:

The bias and kernel params should be put on the same gpu device.

Steps to reproduce:

  import jax
  import jax.numpy as jnp
  from jax import tree_util
  from flax import linen as nn

  device = jax.devices("gpu")[1]

  class MyModel(nn.Module):
      @nn.compact
      def __call__(self, x):
          x = nn.Conv(64, (3, 3), 1, name='conv1')(x)
          x = nn.relu(x)
          return x

  rng = jax.random.PRNGKey(0)
  rng = jax.device_put(rng, device)
  dummy_input = jax.device_put(jnp.ones((5, 64, 64, 32)), device) 

  model = MyModel()  
  model_params = model.init({'params': rng}, dummy_input)
  # model_params = tree_util.tree_map(lambda x: jax.device_put(x, device), model_params)
  print(tree_util.tree_map(lambda x: (x.device()), model_params))

The output is

FrozenDict({
    params: {
        conv1: {
            bias: gpu(id=0),
            kernel: gpu(id=1),
        },
    },
})

Thanks!

The text was updated successfully, but these errors were encountered:

MasterSkepticista · 2024-08-30T04:27:10Z

Hi @YunxiTang, I am able to reproduce this issue.

In practice, I have seen flax models initialized on cpu, and migrated/replicated to devices later. Two examples:

Migrating params post-initialization to GPU.

# Optional: Init on `cpu`.
model_params = jax.jit(model.init, backend="cpu")({'params': rng}, dummy_input)
model_params = jax.device_put(model_params, device)
jax.tree.map(lambda p: p.device, model_params)
# {'params': {'conv1': {'bias': CudaDevice(id=1), 'kernel': CudaDevice(id=1)}}}

Using jax.default_device scope.

with jax.default_device(device):
    model_params = model.init({'params': rng}, dummy_input)
    print(tree_util.tree_map(lambda x: (x.device), model_params))
    # {'params': {'conv1': {'bias': CudaDevice(id=1), 'kernel': CudaDevice(id=1)}}}

I will let Flax team comment on the default behavior in your case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bias and kernel params are put on different gpu devices #4116

bias and kernel params are put on different gpu devices #4116

YunxiTang commented Aug 6, 2024 •

edited

Loading

MasterSkepticista commented Aug 30, 2024

bias and kernel params are put on different gpu devices #4116

bias and kernel params are put on different gpu devices #4116

Comments

YunxiTang commented Aug 6, 2024 • edited Loading

System information

Problem you have encountered:

What you expected to happen:

Steps to reproduce:

MasterSkepticista commented Aug 30, 2024

YunxiTang commented Aug 6, 2024 •

edited

Loading