Untitled

 avatar
user_3839718
python
a year ago
488 B
5
Indexable
    df = pd.DataFrame(es_client).sample(frac=1).reset_index(drop=True)
    data = df[['title', 'ingredients', 'directions']]

    data.loc[:, 'ingredients'] = data['ingredients'].apply(lambda x: ' | '.join(x))
    data.loc[:, 'directions'] = data['directions'].apply(lambda x: ' | '.join(x))

    data = data[data['directions'].apply(lambda x: len(x) <= 256)]

    dataset = Dataset.from_pandas(data)

    tokenized_datasets = dataset.map(tokenize_function, batched=True, batch_size=1000)
Editor is loading...
Leave a Comment