## 31.1 Basics of {recipes}

The **{recipes}** package is designed to replace the stats::model.matrix() function that you might be familiar with. For example, if you fit a model like the one below

```
##
## Call:
## lm(formula = bill_length_mm ~ species, data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9338 -2.2049 0.0086 2.0662 12.0951
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.7914 0.2409 161.05 <2e-16 ***
## speciesChinstrap 10.0424 0.4323 23.23 <2e-16 ***
## speciesGentoo 8.7135 0.3595 24.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.96 on 339 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.7078, Adjusted R-squared: 0.7061
## F-statistic: 410.6 on 2 and 339 DF, p-value: < 2.2e-16
```

You can see that our `species`

column, which has the values Adelie, Gentoo, Chinstrap, is automatically dummy-coded for us, with the first level in the factor variable (Adelie) set as the reference group. The **{recipes}** package forces you to be a bit more explicit in these decisions. But it also has a much wider range of modifications it can make to the data.

Using **{recipes}** also allows you to more easily separate the pre-processing and modeling stages of your data analysis workflow. In the above example, you may not have even realized `stats::model.matrix()`

was doing anything for you because it’s wrapped within the `stats::lm()`

modeling code. But with **{recipes}**, you make the modifications to your data first and *then* conduct your analysis.

How does this separation work? With **{recipes}**, you can create a blueprint (or recipe) to apply to a given dataset, without actually applying those operations. You can then use this blueprint iteratively across sets of data (e.g., folds) as well as on new (potentially unseen) data that has the same structure (variables). This process helps avoid data leakage because all operations are carried forward and applied together, and no operations are conducted until explicitly requested.