- Don’t pay attention to data types
- Treat anything with numbers as a numeric
- Don’t remove infrequently occurring factor levels
- Ignore seasonality
- When modeling retail data use single months or quarters
- Holidays don’t matter
- Ignore effects of time when generating error metrics
- Mix the same time period in your test / train sets
- Don’t use a holdout set to determine model parameters
- Don’t do counts of rows / columns
- As long as you didn’t get an error message when you run your code everything is ok – no rows or columns were silently removed.
- Don’t profile your data before building a model
- Assume that unknown’s in the target variable of your classification model are negatives (or positives).
- Assume that high scoring models don’t need to be examined
- Don’t worry about the time-stamping of your data
- Don’t set a seed – no need to reproduce the results