prepare_maxdiff_data#
- pymc_marketing.customer_choice.maxdiff.prepare_maxdiff_data(task_df, items, respondent_id='respondent_id', task_id='task_id', item_col='item_id', best_col='is_best', worst_col='is_worst', reference_item=None)[source]#
Reshape long-format MaxDiff data into padded arrays for the likelihood.
Each row of
task_dfrepresents one shown item within one task. Tasks may show different numbers of items (ragged subset sizes are padded toK_maxwith the reference item;maskmarks which positions are real).- Parameters:
- task_df
pd.DataFrame Long-format data with one row per (respondent, task, item) triple. Must contain the five columns named by
respondent_id,task_id,item_col,best_col,worst_col.- items
list[str] Full item pool. Defines the
itemscoord and the index mapping.- respondent_id, task_id, item_col, best_col, worst_col
str Column names in
task_df.- reference_item
str, optional Item whose utility is pinned to 0 for identification. Defaults to
items[-1].
- task_df
- Returns:
MaxDiffArraysTypedDict of padded arrays and metadata.
- Raises:
ValueErrorIf a task lacks exactly one best or worst pick, best == worst within a task, items repeat within a task, a task shows fewer than 2 items, or any item is outside the pool.