Rapidformer Engine¶
rapidformer.engine.arguments module¶
- rapidformer.engine.arguments.parse_args(extra_args_provider=None, defaults={}, ignore_unknown_args=False)¶
Parse all arguments.
- Parameters
extra_args_provider -- Task specific args if needed.
args_defaults -- args defaults.
ignore_unknown_args -- A boolean to specify whether ignore unkown args.
- Returns
parser
rapidformer.engine.clip_grads module¶
- rapidformer.engine.clip_grads.clip_grad_norm_fp32(parameters, max_norm, norm_type=2)¶
- Clips gradient norm of an iterable of parameters whose gradients
are in fp32.
- Parameters
- Returns
Total norm of the parameters (viewed as a single vector).
rapidformer.engine.engine module¶
- class rapidformer.engine.engine.RapidformerEngine(extra_args_provider=None, args_defaults={})¶
Bases:
object
The Rapidformer Engine class to wrap acceleration tricks.
- compose(model=None, optimizer=None, lr_scheduler_fn=None, model_optimizer_lrscheduler_provider_func=None)¶
Generate wrapped model optimizer and lr_scheduler.
model_obj, optimizer and lr_scheduler_fn are used by no trainer user.
model_optimizer_lrscheduler_provider_func are used by trainer user.
lr_scheduler_fn can be made using partial api like below:
lr_scheduler_fn = partial(get_linear_schedule_with_warmup, num_warmup_steps=args.lr_warmup_iters, num_training_steps=args.train_iters)
- Parameters
model -- A Huggingface, EasyTexminer, Megatron model object.
optimizer -- PyTorch optimizer.
lr_scheduler_fn -- lr scheduler function object.
model_optimizer_lrscheduler_provider_func -- The function will be used as callback to build model, optimizer and lr scheduler.
- Returns
model, optimizer, lr_scheduler
rapidformer.engine.global_vars module¶
- rapidformer.engine.global_vars.get_tokenizer()¶
Return tokenizer.
- rapidformer.engine.global_vars.get_args()¶
Return arguments.
- rapidformer.engine.global_vars.get_num_microbatches()¶
Return number microbatches.
- rapidformer.engine.global_vars.get_current_global_batch_size()¶
Return current global batch size.
- rapidformer.engine.global_vars.update_num_microbatches(consumed_samples, consistency_check=True)¶
- rapidformer.engine.global_vars.get_timers()¶
Return timers.
- rapidformer.engine.global_vars.get_logger()¶
Return logger.
- rapidformer.engine.global_vars.set_rapidformer_global_variables(extra_args_provider=None, args_defaults={}, ignore_unknown_args=False)¶
Set args, tokenizer, tensorboard-writer, adlr-autoresume, and timers.
- class rapidformer.engine.global_vars.Timers(logger)¶
Bases:
object
Group of timers.
- write(names, writer, iteration, normalizer=1.0, reset=False)¶
Write timers to a tensorboard writer
- log(names, normalizer=1.0, reset=True)¶
Log a group of timers.
- class rapidformer.engine.global_vars.Logger(log_file=None, level='info')¶
Bases:
object
- level_relations = {'crit': 50, 'debug': 10, 'error': 40, 'info': 20, 'warning': 30}¶
- rapidformer.engine.global_vars.build_num_microbatches_calculator(args)¶
- class rapidformer.engine.global_vars.NumMicroBatchesCalculator¶
Bases:
abc.ABC
- get()¶
- get_current_global_batch_size()¶
- abstract update(consumed_samples, consistency_check)¶
- class rapidformer.engine.global_vars.ConstantNumMicroBatches(global_batch_size, micro_batch_size, data_parallel_size)¶
Bases:
rapidformer.engine.global_vars.NumMicroBatchesCalculator
- update(consumed_samples, consistency_check)¶
- class rapidformer.engine.global_vars.RampupBatchsizeNumMicroBatches(start_batch_size, batch_size_increment, ramup_samples, global_batch_size, micro_batch_size, data_parallel_size)¶
Bases:
rapidformer.engine.global_vars.NumMicroBatchesCalculator
- update(consumed_samples, consistency_check)¶
rapidformer.engine.initialize module¶
- rapidformer.engine.initialize.initialize_rapidformer(extra_args_provider=None, args_defaults={}, ignore_unknown_args=False, allow_no_cuda=False)¶
Set global variables, initialize distributed, and set autoresume and random seeds.
- Parameters
extra_args_provider -- Task specific args if needed.
args_defaults -- Arguments defaults.
ignore_unknown_args -- A boolean to specify whether ignore unkown args.
allow_no_cuda -- should not be set unless using megatron for cpu only data processing. In general this arg should not be set unless you know what you are doing.
- Returns
a function to finalize distributed env initialization (optionally, only when args.lazy_mpu_init == True)
rapidformer.engine.optimizer module¶
Megatron optimizer.
- class rapidformer.engine.optimizer.MegatronOptimizer(optimizer, clip_grad, log_num_zeros_in_grad, params_have_main_grad, use_contiguous_buffers_in_local_ddp)¶
Bases:
abc.ABC
- get_parameters()¶
- clip_grad_norm(clip_grad)¶
- count_zeros()¶
- abstract zero_grad(set_to_none=True)¶
- abstract get_loss_scale()¶
The output should be a cuda tensor of size 1.
- scale_loss(loss)¶
Simple scaling.
- abstract step()¶
- abstract reload_model_params()¶
Refreshes any internal state from the current model parameters. Call whenever the parameters are changed outside of the optimizer. For example, when we load a model from a checkpoint without loading the optimizer, the model parameters are updated but for fp16 optimizer with main parameters, the main parameters need to also be updated.
- abstract state_dict()¶
- abstract load_state_dict(state_dict)¶
- property state¶
- property param_groups¶
- class rapidformer.engine.optimizer.Float16OptimizerWithFloat16Params(optimizer, clip_grad, log_num_zeros_in_grad, params_have_main_grad, use_contiguous_buffers_in_local_ddp, bf16, grad_scaler)¶
Bases:
rapidformer.engine.optimizer.MegatronOptimizer
Float16 optimizer for fp16 and bf16 data types.
- Parameters
optimizer -- base optimizer such as Adam or SGD
clip_grad -- clip gradeints with this global L2 norm. Note that clipping is ignored if clip_grad == 0
log_num_zeros_in_grad -- return number of zeros in the gradients.
params_have_main_grad -- flag indicating if parameters have a main_grad field. If this is set, we are assuming that the model parameters are store in the main_grad field instead of the typical grad field. This happens for the DDP cases where there is a contihuous buffer holding the gradients. For example for bfloat16, we want to do gradient accumulation and all-reduces in float32 and as a result we store those gradients in the main_grad. Note that main grad is not necessarily in float32.
bf16 -- if true, the model is running in bfloat16.
grad_scaler -- used for scaling gradients. Note that this can be None. This case happens when bf16 = True and we don't use any loss scale. Note that for bf16 = True, we can have a constnat gradient scaler. Also for bf16 = False, we always require a grad scaler.
- zero_grad(set_to_none=True)¶
We only need to zero the model related parameters, i.e., float16_groups & fp32_from_fp32_groups. We additionally zero fp32_from_float16_groups as a memory optimization to reduce fragmentation; in the case of set_to_none==True, the space used by this field can be safely deallocated at this point.
- get_loss_scale()¶
- reload_model_params()¶
- step()¶
- state_dict()¶
- load_state_dict(state_dict)¶
rapidformer.engine.schedules module¶
- rapidformer.engine.schedules.get_learning_rate_scheduler(optimizer)¶
Build the learning rate scheduler.
rapidformer.engine.utils module¶
- rapidformer.engine.utils.honor_type(obj, generator)¶
Cast a generator to the same type as obj (list, tuple or namedtuple)
- rapidformer.engine.utils.is_torch_tensor(tensor)¶
- rapidformer.engine.utils.recursively_apply(func, data, *args, test_type=<function is_torch_tensor>, error_on_other_type=False, **kwargs)¶
Recursively apply a function on a data structure that is a nested list/tuple/dictionary of a given base type.
- Parameters
func (
callable
) -- The function to recursively apply.data (nested list/tuple/dictionary of
main_type
) -- The data on which to applyfunc
*args -- Positional arguments that will be passed to
func
when applied on the unpacked data.main_type (
type
, optional, defaults totorch.Tensor
) -- The base type of the objects to which applyfunc
.error_on_other_type (
bool
, optional, defaults toFalse
) -- Whether to return an error or not if after unpackingdata
, we get on an object that is not of typemain_type
. IfFalse
, the function will leave objects of types different thanmain_type
unchanged.**kwargs -- Keyword arguments that will be passed to
func
when applied on the unpacked data.
- Returns
The same data structure as
data
withfunc
applied to every object of typemain_type
.
- rapidformer.engine.utils.gather(tensor)¶
- rapidformer.engine.utils.send_to_device(tensor, device)¶
Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device.
- Parameters
tensor (nested list/tuple/dictionary of
torch.Tensor
) -- The data to send to a given device.device (
torch.device
) -- The device to send the data to
- Returns
The same data structure as
tensor
with all tensors sent to the proper device.
- rapidformer.engine.utils.report_memory(name, logger)¶
Simple GPU memory report.
- rapidformer.engine.utils.unwrap_model(model, module_instances=<class 'torch.nn.parallel.distributed.DistributedDataParallel'>)¶
- rapidformer.engine.utils.average_losses_across_data_parallel_group(losses)¶
Reduce a tensor of losses across all GPUs.