It is helpful to view all machine-learning methods as approximations of Bayesian inference. It allows to devise new approximations or to make some approximations more precise.
The following two videos show the unified view. They explain it better than I would do.
Note that the goal is to minimize the expected loss. The expectation is over all possible examples. Modeling P(X, Y) can help to have small loss on unseen examples.
Even SVM can be viewed as a probabilistic model.