nord vpnnord vpn
Ad

Untitled

mail@pastecode.io avatar
unknown
plain_text
7 months ago
1.6 kB
3
Indexable
Never

from sklearn_pandas import DataFrameMapper, gen_features

mean_columns_2d = [[f] for f in mean_columns]
bool_columns_2d = [[f] for f in bool_cols]

feature_def1 = gen_features(columns=mean_columns_2d, classes=[{'class': SimpleImputer, 'strategy':'mean'},StandardScaler])
feature_def2 = gen_features(columns=bool_columns_2d, classes=[{'class': SimpleImputer, 'strategy':'most_frequent'}])

feature_def_all = feature_def1 + feature_def2
mapper = DataFrameMapper(feature_def_all)


x_train = mapper.fit_transform(df_train).astype('float32')
x_val = mapper.transform(df_val).astype('float32')

In the original code, mean_columns and bool_cols are assumed to be lists of column names. By using a list comprehension [[f] for f in mean_columns], each column name is enclosed within a separate inner list. This is done to match the expected format for feature definitions in sklearn_pandas and DataFrameMapper, which expects a 2D list where each inner list represents a column name or a group of column names.

For example, if mean_columns contains ['column1', 'column2'], the list comprehension [[f] for f in mean_columns] will create [['column1'], ['column2']]. Now, each column name is represented as a separate inner list, making it compatible with the expected format in the feature definitions.

Similarly, the bool_columns_2d variable is created in the same way, ensuring that each column name is enclosed within a separate inner list.

This 2D list format allows you to define feature transformations for each column individually or groups of columns collectively in the subsequent steps of the code.

nord vpnnord vpn
Ad