We study the coupled evolution of earthquakes and faults in a model consisting of a seismogenic upper crust governed by damage rheology over a viscoelastic substrate. The damage rheology has two types of functional coefficients: (1) a "generalized internal friction" separating states associated with material degradation and healing and (2) damage rate coefficients for positive (degradation) and negative (healing) changes. The evolving damage modifies the effective elastic properties of material in the upper crust as a function of the ongoing deformation. This simulates the creation and healing of fault systems in the upper seismogenic zone. In addition to the vertically averaged thin sheet approximation we introduce a Green function for three-dimensional elastic half-space for the instantaneous component of deformation. The formulation accounts in an internally consistent manner for evolving deformation fields, evolving fault structures, aseismic energy release, and spatiotemporal seismicity patterns. These developments allow us to simulate long histories of crustal deformation and to study the simultaneous evolution of regional earthquakes and faults for various model realizations. To focus on basic features of a large strike-slip fault system, we first consider a simplified geometry of the seismogenic crust by prescribing initial conditions consisting of a narrow damage zone in an otherwise damage-free plate. For this configuration, the model generates an earthquake cycle with distinct interseismic, preseismic, coseismic, and postseismic periods. Model evolution during each period is controlled by a subset of physical properties, which may be constrained by geophysical, geodetic, rock mechanics, and seismological data. In the more generic case with a random initial damage distribution, the model generates large crustal faults and subsidiary branches with complex geometries. The simulated statistics depend on the space-time window of the observational domain. The results indicate that long healing timescale, τh, describing systems with relatively long memory, leads to the development of geometrically regular fault systems and the characteristic frequency-size earthquake distribution. Conversely, short τh (relatively short memory) leads to the development of a network of disordered fault systems and the Gutenberg-Richter earthquake statistics. For intermediate values of τh the results exhibit alternating overall switching of response from periods of intense seismic activity and the characteristic earthquake distribution to periods of low seismic activity and Gutenberg-Richter statistics.