Harmonic analysis, and in particular the relation between function smoothness and approximate sparsity of its wavelet coefficients, has played a key role in signal processing and statistical inference for low dimensional data. In contrast, harmonic analysis has thus far had little impact in modern problems involving high dimensional data, or data encoded as graphs or networks. The main contribution of this paper is the development of a harmonic analysis approach, including both learning algorithms and supporting theory, applicable to these more general settings. Given data (be it high dimensional, graph or network) that is represented by one or more hierarchical trees, we first construct multi-scale wavelet-like orthonormal bases on it. Second, we prove that in analogy to the Euclidean case, function smoothness with respect to a specific metric induced by the tree is equivalent to exponential rate of coefficient decay, that is, to approximate sparsity. These results readily translate to simple practical algorithms for various learning tasks. We present an application to transductive semi-supervised learning.