a year ago
2.5 kB
Data Loading and Preprocessing: Load your dataset into a pandas DataFrame where each row contains the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) and the target column (person_who_resolved). import pandas as pd # Load your dataset into a pandas DataFrame (assuming it's in a CSV file) df = pd.read_csv('your_dataset.csv') # Split the dataset into training and validation sets (you can adjust the test_size) from sklearn.model_selection import train_test_split train_df, val_df = train_test_split(df, test_size=0.2, random_state=42) Data Tokenization: Modify the tokenization function to tokenize the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) together and add the target column (person_who_resolved) as labels. def tokenize_data(data): inputs = tokenizer( data['ticket_category'] + ' ' + data['ticket_type'] + ' ' + data['ticket_item'] + ' ' + data['ticket_summary'] + ' ' + data['ticket_desc'], padding=True, truncation=True, return_tensors="pt" ) inputs["labels"] = data['person_who_resolved'] return inputs Model Configuration: Configure the model for sequence classification: model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(model_name) model = BertForSequenceClassification.from_pretrained(model_name, num_labels=num_classes) Training and Evaluation: Fine-tune the model and evaluate it with the updated datasets: train_dataset = tokenize_data(train_df) val_dataset = tokenize_data(val_df) trainer = Trainer( model=model, args=training_args, compute_metrics=compute_metrics, train_dataset=train_dataset, eval_dataset=val_dataset, ) trainer.train() results = trainer.evaluate() Inference: For inference, you can use the same tokenization approach as in the tokenize_data function, and then predict the resolved person based on the input text. new_data = ["Your input text goes here"] inputs = tokenizer( ' '.join(new_data), padding=True, truncation=True, return_tensors="pt" ) outputs = model(**inputs) predicted_class = torch.argmax(outputs.logits, dim=1) # You'll need to map the predicted class to the actual resolved person using your label mapping. predicted_person = label_mapping[predicted_class.item()] print(f"Predicted person who resolved: {predicted_person}")