Untitled

mail@pastecode.io avatar
unknown
plain_text
a year ago
2.5 kB
8
Indexable
Data Loading and Preprocessing:

Load your dataset into a pandas DataFrame where each row contains the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) and the target column (person_who_resolved).

import pandas as pd

# Load your dataset into a pandas DataFrame (assuming it's in a CSV file)
df = pd.read_csv('your_dataset.csv')

# Split the dataset into training and validation sets (you can adjust the test_size)
from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

Data Tokenization:

Modify the tokenization function to tokenize the input columns (ticket_category, ticket_type, ticket_item, ticket_summary, and ticket_desc) together and add the target column (person_who_resolved) as labels.

def tokenize_data(data):
    inputs = tokenizer(
        data['ticket_category'] + ' ' + data['ticket_type'] + ' ' +
        data['ticket_item'] + ' ' + data['ticket_summary'] + ' ' + data['ticket_desc'],
        padding=True,
        truncation=True,
        return_tensors="pt"
    )
    inputs["labels"] = data['person_who_resolved']
    return inputs

Model Configuration:

Configure the model for sequence classification:

model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=num_classes)

Training and Evaluation:

Fine-tune the model and evaluate it with the updated datasets:

train_dataset = tokenize_data(train_df)
val_dataset = tokenize_data(val_df)

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()
results = trainer.evaluate()

Inference:

For inference, you can use the same tokenization approach as in the tokenize_data function, and then predict the resolved person based on the input text.

new_data = ["Your input text goes here"]
inputs = tokenizer(
    ' '.join(new_data),
    padding=True,
    truncation=True,
    return_tensors="pt"
)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1)

# You'll need to map the predicted class to the actual resolved person using your label mapping.
predicted_person = label_mapping[predicted_class.item()]
print(f"Predicted person who resolved: {predicted_person}")