Data Loading and Preprocessing:
Load your dataset into a pandas DataFrame where each row contains the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) and the target column (person_who_resolved).
import pandas as pd
# Load your dataset into a pandas DataFrame (assuming it's in a CSV file)
df = pd.read_csv('your_dataset.csv')
# Split the dataset into training and validation sets (you can adjust the test_size)
from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)
Data Tokenization:
Modify the tokenization function to tokenize the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) together and add the target column (person_who_resolved) as labels.
def tokenize_data(data):
inputs = tokenizer(
data['ticket_category'] + ' ' + data['ticket_type'] + ' ' +
data['ticket_item'] + ' ' + data['ticket_summary'] + ' ' + data['ticket_desc'],
padding=True,
truncation=True,
return_tensors="pt"
)
inputs["labels"] = data['person_who_resolved']
return inputs
Model Configuration:
Configure the model for sequence classification:
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=num_classes)
Training and Evaluation:
Fine-tune the model and evaluate it with the updated datasets:
train_dataset = tokenize_data(train_df)
val_dataset = tokenize_data(val_df)
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
trainer.train()
results = trainer.evaluate()
Inference:
For inference, you can use the same tokenization approach as in the tokenize_data function, and then predict the resolved person based on the input text.
new_data = ["Your input text goes here"]
inputs = tokenizer(
' '.join(new_data),
padding=True,
truncation=True,
return_tensors="pt"
)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1)
# You'll need to map the predicted class to the actual resolved person using your label mapping.
predicted_person = label_mapping[predicted_class.item()]
print(f"Predicted person who resolved: {predicted_person}")