It’s a must to measure ML performance by generating such a report. how to interpret it?
Accuracy is more intuitive. Classification accuracy is the total number of correct predictions divided by the total number of predictions made for a dataset. for example, if there are 100 data points in the sample, above stats 0.87 means 87% of the time the machine called positive or negative correctly.

So it seems acceptable of above classification work. However, the precision and recall stats tells different story especially on the negative(0) group: in all positive(1) sample, yes, 97% are correct, but in all negative sample only 17% is called out correctly.
Precision quantifies the number of positive class predictions that actually belong to the positive class. So in above example, if the machine called out 100 times as positive, actually only 88 of them are positive. the remaining 12 are false positive.
It is a skewed/imbalanced data.
F-Measure provides a single score (harmonic mean of precision and recall rate) that balances both the concerns of precision and recall in one number.
The traditional F measure is calculated as follows:
- F-Measure = (2 * Precision * Recall) / (Precision + Recall)