generate_maxdiff_conjoint_data#

pymc_marketing.customer_choice.synthetic_data.generate_maxdiff_conjoint_data(n_respondents=150, n_items=12, item_attributes=None, utility_formula='~ 0 + C(brand) + price + quality', true_betas=None, n_tasks_per_resp=12, subset_size=4, random_attributes=None, sigma_respondent=0.4, items=None, random_seed=None)[source]#

Generate synthetic MaxDiff data with item-attribute utilities (part-worths).

Simulates a MaxDiff survey where each item has a fixed attribute profile and utilities are computed as \(U_i = X_i^\top \beta + \text{noise}\). Respondents optionally carry heterogeneous part-worths on a subset of features (analogous to the random-coefficients formulation in MaxDiffMixedLogit).

Parameters:
n_respondentsint, default 150

Number of respondents.

n_itemsint, default 12

Full item pool size. Ignored if item_attributes is provided.

item_attributespd.DataFrame, optional

One row per item, index = item name, columns = attributes. If None, attributes are auto-generated: a 3-level brand categorical and two continuous features price ~ Uniform(0, 1) and quality ~ Normal(0, 1).

utility_formulastr, default "~ 0 + C(brand) + price + quality"

Patsy formula used to expand item_attributes into a design matrix.

true_betasdict[str, float], optional

Ground-truth part-worths keyed by patsy-expanded feature name. Missing keys are drawn from Normal(0, 1).

n_tasks_per_respint, default 12

Tasks per respondent.

subset_sizeint, default 4

Items shown per task.

random_attributeslist[str], optional

Feature names whose part-worths vary across respondents. Defaults to all features.

sigma_respondentfloat, default 0.4

Scale of per-respondent deviations on the random-feature subset.

itemslist[str], optional

Item names. Defaults to ["item_0", ...] when item_attributes is None; otherwise taken from item_attributes.index.

random_seednp.random.Generator or int, optional

Random state.

Returns:
task_dfpd.DataFrame

Long-format data with columns respondent_id, task_id, item_id, is_best, is_worst.

item_attributespd.DataFrame

The attribute table, indexed by item name. Aligned with items.

ground_truthdict

{"betas", "respondent_betas", "feature_names", "random_attributes", "sigma_respondent", "items", "X"}. betas is the population part-worth vector; respondent_betas holds the per-respondent part-worth matrix actually used to simulate picks.

Notes

Real conjoint studies use balanced designs; this generator draws task subsets uniformly for simplicity, which is adequate for recovery tests.