easynlp.data¶
Dataset for sequence classification¶
- 
class 
easynlp.appzoo.sequence_classification.data.ClassificationDataset(pretrained_model_name_or_path, data_file, max_seq_length, input_schema, first_sequence, label_name=None, second_sequence=None, label_enumerate_values=None, multi_label=False, *args, **kwargs)[source]¶ Classification Dataset
Parameters: - pretrained_model_name_or_path -- for init tokenizer.
 - data_file -- input data file.
 - max_seq_length -- max sequence length of each input instance.
 - first_sequence -- input text
 - label_name -- label column name
 - second_sequence -- set as None
 - label_enumerate_values -- a list of label values
 - multi_label -- set as True if perform multi-label classification, otherwise False
 
- 
label_enumerate_values¶ Returns the label enumerate values.
- 
convert_single_row_to_example(row)[source]¶ Convert sample token to indices.
Parameters: - row -- contains sequence and label.
 - text_a -- the first sequence in row.
 - text_b -- the second sequence in row if self.second_sequence is true.
 - label -- label token if self.label_name is true.
 
- Returns: sing example
 - encoding: an example contains token indices.
 
Dataset for sequence labeling¶
- 
class 
easynlp.appzoo.sequence_labeling.data.InputExample(text_a, text_b=None, label=None, guid=None)[source]¶ A single training/test example for simple sequence classification.
- 
class 
easynlp.appzoo.sequence_labeling.data.LabelingFeatures(input_ids, input_mask, segment_ids, all_tokens, label_ids, tok_to_orig_index, seq_length=None, guid=None)[source]¶ A single set of features of data for sequence labeling.
- 
easynlp.appzoo.sequence_labeling.data.bert_labeling_convert_example_to_feature(example, tokenizer, max_seq_length, label_map=None)[source]¶ Convert InputExample into InputFeature For sequence labeling task
Parameters: - example (InputExample) -- an input example
 - tokenizer (BertTokenizer) -- BERT Tokenizer
 - max_seq_length (int) -- Maximum sequence length while truncating
 - label_map (dict) -- a map from label_value --> label_idx, "regression" task if it is None else "classification"
 
Returns: an input feature
Return type: feature (InputFeatures)
- 
class 
easynlp.appzoo.sequence_labeling.data.SequenceLabelingDataset(pretrained_model_name_or_path, data_file, max_seq_length, first_sequence, label_name=None, label_enumerate_values=None, *args, **kwargs)[source]¶ Sequence Labeling Dataset
Parameters: - pretrained_model_name_or_path -- for init tokenizer.
 - data_file -- input data file.
 - max_seq_length -- max sequence length of each input instance.
 - first_sequence -- input sequence.
 - label_name -- label column name.
 - label_enumerate_values -- the list of label values.
 
- 
label_enumerate_values¶ 
Dataset for language modeling¶
- 
class 
easynlp.appzoo.language_modeling.data.LanguageModelingDataset(pretrained_model_name_or_path, data_file, max_seq_length, user_defined_parameters, mlm_mask_prop=0.15, **kwargs)[source]¶ Whole word mask Language Model Dataset
Parameters: - pretrained_model_name_or_path -- for init tokenizer.
 - data_file -- input data file.
 - max_seq_length -- max sequence length of each input instance.
 - mlm_mask_prop -- the percentage of masked words