Data preparation

This is the most important step in machine learning, and performance of your model totally depends on it. Some general tips includes:

  • Cleaning your data set from suspicious and wrong data
  • Feature engineering and giving more hints to the model
  • Normalizing and scaling input data
  • Randomizing input data order

There is definitely more to that list, and it's strongly recommended to spend more time on studying and preparation of input data than selection and tweaking of the machine learning method. pandas and scikit-learn are your best friends for data preparation if you are familiar with Python programming language.