cnlpt.cnlp_processors module¶
Module containing processor classes, evaluation metrics, and output modes for tasks defined in the library.
Add custom classes here to add new tasks to the library with the following steps:
Create a unique
task_namefor your task.cnlp_output_modes– Add a mapping from a task name to a task type. Currently supported task types are sentence classification, tagging, relation extraction, and multi-task sentence classification.Processor class – Create a subclass of
transformers.DataProcessorfor your data source. There are multiple examples to base off of, including intermediate abstractions likeLabeledSentenceProcessor,RelationProcessor,SequenceProcessor, that simplify the implementation.cnlp_processors– Add a mapping from your task name to the “processor” class you created in the last step.(Optional) – Modify
cnlp_compute_metrics()to add you task. If your task is classification a reasonable default will be used so this step would be optional.
- cnlpt.cnlp_processors.cnlp_processors¶
Mapping from task names to processor classes
- Type
- cnlpt.cnlp_processors.cnlp_output_modes¶
Mapping from task names to output modes
- cnlpt.cnlp_processors.tagging_metrics(task_name, preds, labels)¶
One of the metrics functions for use in
cnlp_compute_metrics().Generates evaluation metrics for sequence tagging tasks.
Ignores tags for which the true label is -100.
The returned dict is structured as follows:
{ 'acc': accuracy 'token_f1': token-wise F1 score 'f1': seqeval F1 score 'report': seqeval classification report }
- Parameters
task_name (str) – the task name used to index into cnlp_processors
preds (numpy.ndarray) – the predicted labels from the model
labels (numpy.ndarray) – the true labels
- Return type
- Returns
a dictionary containing evaluation metrics
- cnlpt.cnlp_processors.relation_metrics(task_name, preds, labels)¶
One of the metrics functions for use in
cnlp_compute_metrics().Generates evaluation metrics for relation extraction tasks.
Ignores tags for which the true label is -100.
The returned dict is structured as follows:
{ 'f1': F1 score 'acc': accuracy 'recall': recall 'precision': precision }
- Parameters
task_name (str) – the task name used to index into cnlp_processors
preds (numpy.ndarray) – the predicted labels from the model
labels (numpy.ndarray) – the true labels
- Return type
- Returns
a dictionary containing evaluation metrics
- cnlpt.cnlp_processors.acc_and_f1(preds, labels)¶
One of the metrics functions for use in
cnlp_compute_metrics().Generates evaluation metrics for generic tasks.
The returned dict is structured as follows:
{ 'acc': accuracy 'f1': F1 score 'acc_and_f1': mean of accuracy and F1 score 'recall': recall 'precision': precision }
- Parameters
preds (numpy.ndarray) – the predicted labels from the model
labels (numpy.ndarray) – the true labels
- Return type
- Returns
a dictionary containing evaluation metrics
- cnlpt.cnlp_processors.cnlp_compute_metrics(task_name, preds, labels)¶
Function that defines and computes the metrics used for each task.
When adding a task definition to this file, add a branch to this function defining what its evaluation metric invocation should be. If the new task is a simple classification task, a sensible default is defined; falling back on this will trigger a warning.
- Parameters
task_name (str) – the task name used to index into cnlp_processors
preds (numpy.ndarray) – the predicted labels from the model
labels (numpy.ndarray) – the true labels
- Return type
- Returns
a dictionary containing evaluation metrics
- class cnlpt.cnlp_processors.CnlpProcessor¶
Bases:
DataProcessorBase class for single-task dataset processors
- __init__(downsampling=None)¶
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- get_example_from_tensor_dict(tensor_dict)¶
Gets an example from a dict with tensorflow tensors.
- Parameters
tensor_dict – Keys and values should match the corresponding Glue tensorflow_dataset examples.
- get_train_examples(data_dir)¶
Gets a collection of [InputExample] for the train set.
- get_dev_examples(data_dir)¶
Gets a collection of [InputExample] for the dev set.
- get_test_examples(data_dir)¶
Gets a collection of [InputExample] for the test set.
- _create_examples(lines, set_type, sequence=False, relations=False)¶
This is an internal function, but it is included in the documentation to illustrate the input format for single-task datasets.
Creates examples for the training, dev and test sets from a headingless TSV file with one of the following structures:
For sequence classification:
label text
For sequence tagging:
tag1 tag2 ... tagN text
For relation tagging:
<source1,target1> , <source2,target2> , ... , <sourceN,targetN> text
TODO: check that these formats are correct
- class cnlpt.cnlp_processors.LabeledSentenceProcessor¶
Bases:
CnlpProcessorBase class for labeled sentence dataset processors
- _create_examples(lines, set_type)¶
This is an internal function, but it is included in the documentation to illustrate the input format for single-task datasets.
Creates examples for the training, dev and test sets from a headingless TSV file with one of the following structures:
For sequence classification:
label text
For sequence tagging:
tag1 tag2 ... tagN text
For relation tagging:
<source1,target1> , <source2,target2> , ... , <sourceN,targetN> text
TODO: check that these formats are correct
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.NegationProcessor¶
Bases:
LabeledSentenceProcessorProcessor for the negation datasets
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.UncertaintyProcessor¶
Bases:
LabeledSentenceProcessorProcessor for the negation datasets
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.HistoryProcessor¶
Bases:
LabeledSentenceProcessorProcessor for the negation datasets
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.DtrProcessor¶
Bases:
LabeledSentenceProcessorProcessor for DocTimeRel datasets
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.AlinkxProcessor¶
Bases:
LabeledSentenceProcessorProcessor for an THYME ALINK dataset (links that describe change in temporal status of an event) The classifier version of the task is _given_ an event known to have some aspectual status, label that status.
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.AlinkProcessor¶
Bases:
LabeledSentenceProcessorProcessor for an THYME ALINK dataset (links that describe change in temporal status of an event) The classifier version of the task is _given_ an event known to have some aspectual status, label that status.
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.ContainsProcessor¶
Bases:
LabeledSentenceProcessorProcessor for narrative container relation (THYME). Describes the contains relation status between the two highlighted temporal entities (event or timex). NONE - no relation, CONTAINS - arg 1 contains arg2, CONTAINS-1 - arg 2 contains arg 1
- get_labels()¶
Gets the list of labels for this data set.
- class cnlpt.cnlp_processors.TlinkProcessor¶
Bases:
LabeledSentenceProcessorProcessor for narrative container relation (THYME). Describes the contains relation status between the two highlighted temporal entities (event or timex). NONE - no relation, CONTAINS - arg 1 contains arg2, CONTAINS-1 - arg 2 contains arg 1
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.TimeCatProcessor¶
Bases:
LabeledSentenceProcessorProcessor for a THYME time expression dataset The classifier version of the task is _given_ a time class, label its time category (see labels below).
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.ContextualModalityProcessor¶
Bases:
LabeledSentenceProcessorProcessor for a contextual modality dataset
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.UciDrugSentimentProcessor¶
Bases:
LabeledSentenceProcessorProcessor for the UCI Drug Review sentiment classification dataset
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.Mimic_7_Processor¶
Bases:
LabeledSentenceProcessorTODO: docstring
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.Mimic_3_Processor¶
Bases:
LabeledSentenceProcessorTODO: docstring
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.CovidProcessor¶
Bases:
LabeledSentenceProcessorTODO: docstring
- get_labels()¶
Gets the list of labels for this data set.
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- class cnlpt.cnlp_processors.RelationProcessor¶
Bases:
CnlpProcessorBase class for relation extraction dataset processors
- _create_examples(lines, set_type)¶
This is an internal function, but it is included in the documentation to illustrate the input format for single-task datasets.
Creates examples for the training, dev and test sets from a headingless TSV file with one of the following structures:
For sequence classification:
label text
For sequence tagging:
tag1 tag2 ... tagN text
For relation tagging:
<source1,target1> , <source2,target2> , ... , <sourceN,targetN> text
TODO: check that these formats are correct
- class cnlpt.cnlp_processors.TlinkRelationProcessor¶
Bases:
RelationProcessorTODO: docstring
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- get_labels()¶
Gets the list of labels for this data set.
- class cnlpt.cnlp_processors.SequenceProcessor¶
Bases:
CnlpProcessorBase class for sequence tagging dataset processors
- _create_examples(lines, set_type)¶
This is an internal function, but it is included in the documentation to illustrate the input format for single-task datasets.
Creates examples for the training, dev and test sets from a headingless TSV file with one of the following structures:
For sequence classification:
label text
For sequence tagging:
tag1 tag2 ... tagN text
For relation tagging:
<source1,target1> , <source2,target2> , ... , <sourceN,targetN> text
TODO: check that these formats are correct
- class cnlpt.cnlp_processors.TimexProcessor¶
Bases:
SequenceProcessorTODO: docstring
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- get_labels()¶
Gets the list of labels for this data set.
- class cnlpt.cnlp_processors.EventProcessor¶
Bases:
SequenceProcessorTODO: docstring
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- get_labels()¶
Gets the list of labels for this data set.
- class cnlpt.cnlp_processors.DpheProcessor¶
Bases:
SequenceProcessorTODO: docstring
- get_one_score(results)¶
Return a single value to use as the score for selecting the best model epoch after training.
- get_labels()¶
Gets the list of labels for this data set.
- class cnlpt.cnlp_processors.MTLClassifierProcessor¶
Bases:
DataProcessorBase class for multi-task learning classification dataset processors
- get_classifiers()¶
Get the list of classification subtasks in this multi-task setting
- get_num_tasks()¶
Get the number of subtasks in this multi-task setting.
Equivalent to
len(self.get_classifiers()).- Return type
- Returns
the number of subtasks
- get_classifier_id()¶
Get the classifier ID name used in building the GUIDs for the
transformers.InputExampleinstances.Not necessarily equal to the
task_nameused as keys forcnlp_processorsandcnlp_output_modes.- Return type
- Returns
the value of the classifier ID
- get_default_label()¶
Get the default label to assign to unlabeled instances in the dataset.
- Return type
- Returns
the value of the default label
- get_example_from_tensor_dict(tensor_dict)¶
Not used.
- get_train_examples(data_dir)¶
Gets a collection of [InputExample] for the train set.
- get_dev_examples(data_dir)¶
Gets a collection of [InputExample] for the dev set.
- get_test_examples(data_dir)¶
Gets a collection of [InputExample] for the test set.
- _get_json_examples(fn, set_type)¶
This is an internal function, but it is included in the documentation to illustrate the input format for MTL datasets.
Creates examples for the training, dev and test sets from a JSON file with the following structure:
{ "<guid_1>": { "text": "<text>", "labels: { "<task_1>": "<label>", ... } }, ... }
- Parameters
- Return type
- Returns
the examples loaded from the file
- class cnlpt.cnlp_processors.MimicRadiProcessor¶
Bases:
MTLClassifierProcessorTODO: docstring
- get_classifiers()¶
Get the list of classification subtasks in this multi-task setting
- get_labels()¶
Gets the list of labels for this data set.
- get_classifier_id()¶
Get the classifier ID name used in building the GUIDs for the
transformers.InputExampleinstances.Not necessarily equal to the
task_nameused as keys forcnlp_processorsandcnlp_output_modes.- Return type
- Returns
the value of the classifier ID
- class cnlpt.cnlp_processors.i2b22008Processor¶
Bases:
MTLClassifierProcessorProcessor for the i2b2-2008 disease classification dataset
- get_classifiers()¶
Get the list of classification subtasks in this multi-task setting
- get_labels()¶
Gets the list of labels for this data set.
- get_default_label()¶
Get the default label to assign to unlabeled instances in the dataset.
- Return type
- Returns
the value of the default label
- get_classifier_id()¶
Get the classifier ID name used in building the GUIDs for the
transformers.InputExampleinstances.Not necessarily equal to the
task_nameused as keys forcnlp_processorsandcnlp_output_modes.- Return type
- Returns
the value of the classifier ID
- cnlpt.cnlp_processors.relex = 'relations'¶
Add output modes for new tasks here.