A practical generalization metric for deep networks benchmarking