Skip to content

train error during evaluation with 1 GPU and train with multi GPU #82

@segalinc

Description

@segalinc

Hi thanks for this contribution
as a small exercise I am training SD2 on the pokemon dataset
I precomputed the latents and it starts training on one gpu
However at the evaluation time I get the following error

File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 2814, in _eval_loop
    self.state.outputs = self._original_model.eval_forward(self.state.batch)
  File "/fsx_vfx/users/csegalin/code/diffusion/diffusion/models/stable_diffusion.py", line 255, in eval_forward
    gen_images = self.generate(tokenized_prompts=prompts,
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/fsx_vfx/users/csegalin/code/diffusion/diffusion/models/stable_diffusion.py", line 464, in generate
    pred = self.unet(latent_model_input,
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 934, in forward
    sample = self.conv_in(sample)
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (162 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size`

this is my confguration

name: trial0 # Insert wandb run name
project: pokemon_sd2_256 # Insert wandb project name
seed: 17
eval_first: false
algorithms:
  low_precision_groupnorm:
    attribute: unet
    precision: amp_fp16
  low_precision_layernorm:
    attribute: unet
    precision: amp_fp16
model:
  _target_: diffusion.models.models.stable_diffusion_2
  pretrained: false
  precomputed_latents: true
  encode_latents_in_fp16: true
  fsdp: true
  val_metrics:
    - _target_: torchmetrics.MeanSquaredError
    - _target_: torchmetrics.image.fid.FrechetInceptionDistance
      normalize: true
  val_guidance_scales: [3, 7]
  # val_guidance_scales: []
  loss_bins: []
dataset:
  train_batch_size: 1 # Global training batch size
  eval_batch_size: 1  # Global evaluation batch size
  train_dataset:
    _target_: diffusion.datasets.pokemon.pokemon.build_streaming_dataloader
      # Path to object store bucket(s)
    local: /fsx_vfx/users/csegalin/data/pokemon/latents2_train
      # Path to corresponding local dataset(s)
    mode: 0
    version: 2
    drop_last: False
    shuffle: true
    prefetch_factor: 2
    num_workers: 8
    persistent_workers: true
    pin_memory: true
  eval_dataset:
    _target_: diffusion.datasets.pokemon.pokemon.build_streaming_dataloader
    local: /fsx_vfx/users/csegalin/data/pokemon/latents2_eval # Path to local dataset cache
    prefetch_factor: 2
    num_workers: 8
    persistent_workers: True
    pin_memory: True
    mode: 0
    version: 2
optimizer:
  _target_: torch.optim.AdamW
  lr: 1.0e-5
  weight_decay: 0.01
scheduler:
  _target_: composer.optim.LinearWithWarmupScheduler
  t_warmup: 1000ba
  alpha_f: 1.0
logger:
  comet-ml:
    _target_: composer.loggers.cometml_logger.CometMLLogger
    name: ${name}
    project_name: ${project}
callbacks:
  speed_monitor:
    _target_: composer.callbacks.speed_monitor.SpeedMonitor
    window_size: 10
  lr_monitor:
    _target_: composer.callbacks.lr_monitor.LRMonitor
  memory_monitor:
    _target_: composer.callbacks.memory_monitor.MemoryMonitor
  runtime_estimator:
    _target_: composer.callbacks.runtime_estimator.RuntimeEstimator
  optimizer_monitor:
    _target_: composer.callbacks.OptimizerMonitor
  image_monitor:
    _target_: diffusion.callbacks.log_diffusion_images.LogDiffusionImages
    prompts: # add any prompts you would like to visualize
    - cute dragon creature
    size: 256 # generated image resolution
    guidance_scale: 3
trainer:
  _target_: composer.Trainer
  device: gpu
  max_duration: 550000ba
  eval_interval: 1000ba
  device_train_microbatch_size: 1
  run_name: ${name}
  seed: ${seed}
  save_folder:  trained_model # Insert path to save folder or bucket
  save_interval: 3000ba
  save_overwrite: true
  autoresume: false
  # fsdp_config:
  #   sharding_strategy: "SHARD_GRAD_OP"

``

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions