generate_maxdiff_conjoint_data#

pymc_marketing.customer_choice.synthetic_data.generate_maxdiff_conjoint_data(n_respondents=150, n_items=12, item_attributes=None, utility_formula='~ 0 + C(brand) + price + quality', true_betas=None, n_tasks_per_resp=12, subset_size=4, random_attributes=None, sigma_respondent=0.4, items=None, random_seed=None)[source]#

Generate synthetic MaxDiff data with item-attribute utilities (part-worths).

Simulates a MaxDiff survey where each item has a fixed attribute profile and utilities are computed as \(U_i = X_i^\top \beta + \text{noise}\). Respondents optionally carry heterogeneous part-worths on a subset of features (analogous to the random-coefficients formulation in MaxDiffMixedLogit).

Parameters:

n_respondentsint, default 150: Number of respondents.
n_itemsint, default 12: Full item pool size. Ignored if item_attributes is provided.
item_attributespd.DataFrame, optional: One row per item, index = item name, columns = attributes. If None, attributes are auto-generated: a 3-level brand categorical and two continuous features price ~ Uniform(0, 1) and quality ~ Normal(0, 1).
utility_formulastr, default "~ 0 + C(brand) + price + quality": Patsy formula used to expand item_attributes into a design matrix.
true_betasdict[str, float], optional: Ground-truth part-worths keyed by patsy-expanded feature name. Missing keys are drawn from Normal(0, 1).
n_tasks_per_respint, default 12: Tasks per respondent.
subset_sizeint, default 4: Items shown per task.
random_attributeslist[str], optional: Feature names whose part-worths vary across respondents. Defaults to all features.
sigma_respondentfloat, default 0.4: Scale of per-respondent deviations on the random-feature subset.
itemslist[str], optional: Item names. Defaults to ["item_0", ...] when item_attributes is None; otherwise taken from item_attributes.index.
random_seednp.random.Generator or int, optional: Random state.

Returns:

task_dfpd.DataFrame: Long-format data with columns respondent_id, task_id, item_id, is_best, is_worst.
item_attributespd.DataFrame: The attribute table, indexed by item name. Aligned with items.
ground_truthdict: {"betas", "respondent_betas", "feature_names", "random_attributes", "sigma_respondent", "items", "X"}. betas is the population part-worth vector; respondent_betas holds the per-respondent part-worth matrix actually used to simulate picks.

Notes

Real conjoint studies use balanced designs; this generator draws task subsets uniformly for simplicity, which is adequate for recovery tests.