# An FPGA-based trigger for the search of $\mu^+ \rightarrow e^+ \gamma$ decay in the MEG experiment

L. Galli Physics Departement of Pisa University and INFN Pisa Largo B. Pontecorvo 3, I-56126 Pisa, Italy

Abstract—A novel trigger has been set up and it is now operating in the MEG experiment at Paul Scherrer Institut, which aims at searching for the Lepton Flavour violating decay  $\mu^+ \rightarrow e^+ \gamma$  with unprecedented sensitivity  $(10^{-13}$  for the branching ratio). An overview of the trigger architecture is given, as well as a description of the design and the main features of the dedicated boards and their usage. Particular emphasis is laid on the use of Field Programmable Gate Arrays to implement the on-line event selection algorithms needed to achieve the hightest capable accidental event rejection while keeping the trigger latency lower 400 ns.

## I. INTRODUCTION

T HE MEG experiment [1] operates a sensitive search for the  $\mu^+ \rightarrow e^+ \gamma$  decay, a Lepton Flavour violating process, with a sensitivity on the branching ratio  $(10^{-13})$  improved by two orders of magnitude with respect to the current limit[2]. This process is forbidden in the Standard Model of Particle Physics while it is foreseen to happen in a wide frame of Supersymmetric theories, whose predictions on that branching ratio lie in the range  $10^{-14} \div 10^{-12}$ . So an experimental proof of this signal would provide incontrovertible evidence in favour of Physics beyond the Standard Model.

This experiment utilizes the most intense, low energy DC muon beam, able to provide up to  $10^8 \mu/s$ , which is available at the Paul Scherrer Institut (PSI), Switzerland. The DC muon beam is produced by  $\pi^+$  at rest on the 40 mm Carbon target of the proton beam (590 MeV/c, 2  $\mu$ Amp) and therefore the muon are monochromatics of 29 MeV/c. The beam also contains a background of  $10^9$  positrons that are effectively separeted by an electrostatic separator and beam collimators. The beam, after the separator and an energy degrader, is stopped into a 180  $\mu$ m thick target. The event signature is given by a  $\gamma$ and a  $e^+$  with energy equal to 52.8 MeV emitted at the same time and in opposite direction. The experimental apparatus combines different detection tecniques, each one developed to achieve unprecedented perfomances at such energies. An 800 liters Liquid Xenon Calorimeter [3] provides  $\gamma$  detection. A magnetic spectrometer made of 16 Drift Chambers (DC) coupled to a quasi-solenoidal magnetic field performs  $e^+$ tracking and 30 plastic scintillator bars (TC, [4]) are used for its timing. The expected resolutions are reported in Table I.

The predominant background of MEG comes from the accidental coincidence of a positron from an ordinary muon decay (called Michel positrons) with a gamma by a  $\mu^+ \rightarrow e^+ \nu_e \overline{\nu}_{\mu} \gamma$  or from a  $e^+$  annihilation in flight. The second source

TABLE I EXPEXTED EXPERIMENTAL RESOLUTIONS

|                           | FWHM           |
|---------------------------|----------------|
| $\Delta E_e$              | 0.7 ÷ 0.9 %    |
| $\Delta E_{\gamma}$       | 4 %            |
| $\Delta \theta_{e\gamma}$ | 17 ÷ 20.5 mrad |
| $\Delta T_{e\gamma}$      | 0.15 ns        |

of background, named correlated background, is the standard  $\mu^+ \rightarrow e^+ \nu_e \overline{\nu}_\mu \gamma$  decay. Expected resolution will open the door to reach a single event sensitivity  $\approx 5 \times 10^{-14}$  with  $3 \times 10^7$  s of data taking, with an estimated background of  $\approx 0.5$  events.

The trigger system plays an essential role in processing the detector signals to find the signature of a  $\mu^+ \rightarrow e^+\gamma$  event in a severe pile-up environment performing a powerful background rejection, so to reduce the trigger rate below 10 Hz and keeping the lifetime  $\geq 80 \%$  while preserving the efficiency on the signal  $\epsilon \geq 95\%$ . A constraint is imposed by the MEG waveform digitizers to the trigger system: must be shorter than the 500 ns depth of the digitizers cyclic memories [5]. Thus the trigger has to provide the Stop signal not later than 400 ns from the event occurrence. Finally it has to be flexible to accept other event types needed for detector calibration and monitoring.

This paper presents the architecture of the trigger system, the Firmware developed and its performance.

## **II. TRIGGER STRATEGY**

The  $\mu^+ \rightarrow e^+ \gamma$  signature for muon decay at rest in the laboratory is fully determined by two-body kinematics as written in Section I; it follows that useful observables to select events are energy of  $\gamma$  and  $e^+$ , their time coincidence and opening angle. Requirement on the global trigger latency forces the system to use fast response detectors as the LXe calorimeter and the TC both read by PMTs and prevents us from using informations from the DC detector. The trigger algorithm discriminates on  $\gamma$  energy  $(E_{\gamma})$ ,  $e^+\gamma$  time difference  $(\Delta T_{e^+\gamma})$  and their relative opening angle  $(\theta_{e^+\gamma})$ .

The estimator  $E_{\gamma}$  is extracted from the pulse height of the sum of the LXe waveform. An online calibration taking into account PMTs gain, QE and geometric normalized is applied.

 $T_{\gamma}$  is obtained from a parabolic interpolation of the leading edges of the inner face PMTs. The algorithm selects the time of the PMT that collects the maximum amount of light. The same algorithm is applied the Timing Cunter signals to determine  $T_{e^+}$  estimator. The time selected corresponds to the first TC bar hit.



Fig. 1. Architecture of the trigger system

The impinging point of the photon onto the calorimeter and the TC crossing point of positrons provides an estimator of  $\theta_{\gamma e}$ . Impinging position of  $\gamma$  is given by the position PMT collecting the largest amount of light. The  $e^+$  crossing position is given by the hit bar coordinate and the crossing position along the bars obtained comparing PMTs height pulses.

# **III. ARCHITECTURE**

The Trigger system is arranged in a multi-layer structure, as shown in the Figure 1: a first layer hosts so-called Type1 boards which provides analog signals digitization and a second and a third layer with Type2 boards to collect Type1 data and operate selection algorithms. In addition, an Ancillary System was developed to ensure synchronous operation of the tree (data flux and algorithm execution). The logic is completely programmed into FPGAs and operates at 100 MHz frequency. The system consists in 40 Type1 boards, 6 Type2 and 4 Ancillary.

## A. Type1 Board

Type1 boards are compliant with 6U VME standard. Each board receives 16 analog signals from experimental devices. These signals are digitized by means of 8 Flash ADC AD9218 [6], with 10 bit resolution and 100 MHz sampling speed. A Xilinx FPGAs Virtex-IIpro [7], [8], [9], receives digital data and operates first-level algorithms consisting in pedestal subtraction and calibration of all channels. The data transmission to second-level boards proceeds through LVDS serializers DS90CR481 [11], the transfer rate being 4.8 Gbit/s. Clock reference signal, distributed by Ancillary boards, is multiplied by a factor 5 and distributed all over the board by a Roboclock CY7B994V [12]. This chip provides an independent setting of skews with respect to the carrier signal, which is needed in order to synchronize FADC digitization, FPGA algorithm execution and data LDVS transmission.

1) FE Electronics: The differential input of Type1 FADCs is driven by an AD8138 [13] mounted on dedicated Front End boards. FE electronics is also capable to shift baseline value channel by channel in order to exploit the FADCs dynamic range. FE boards also operate an RC integration on input

signals in order to improve Online time estimation (see Section II); high frequency cut equal to 30 MHz is applied in LXe channels and 15 MHz on TC ones.

## B. Type2 Board

Type2 boards are compliant of the 9U VME standard being used at the intermediate and top level of the trigger tree (see Figure 1). A Type2 receives up to 9 LVDS bus signals from lower level boards. Each LVDS bus is 48 bits wide and the translation from LVDS to CMOS signals is achieved by means ofh 48 bit deserializer DS90CR482, for a total data transfer rate of 4.8 Gbit/s. Data link to the upper level is guaranteed by 2 DS90CR481 serializer. Transferred data are processed by Xilinx Virtex-IIpro FPGAs. The algorithms processing is registered at 100 MHz. Clock signals are distributed by a Roboclock CY7B994V, as in the case of Type1 boards. The so-called Final Type2 collects the full information from the LXe and the TC to look for candidate events; if found, the Stop signal to the DAQ is asserted. In a similar way, the Final Type2 waits for clear of the Busy condition from all DAQ computers and generates a Start signal as soon as it happens. These signals are embedded in a control bus including other useful information for DAQ software, such as the event counter and trigger type.

#### C. Ancillary Board

Ancillary boards are a 9U VME board; they distribute the reference CLK and control signals (Start, Stop and Sync) to the entire Trigger System. The Ancillary System is arranged in a Master-Slave structure. The Master board hosts the reference CLK oscillator, a SARONIX SEL3935 [14] (19.44 MHz, jitter  $\leq$  30 ps over 100.000 cycles) and receives the control signals from the Final Type2 board. These signals are fanned-out through MAXIM LVDS [15] transmitters (maximum jitter  $\leq$  13 ps, skew  $\leq$  60 ps peak-to-peak over the 10 output) by 3 slave boards and distributed to all boards of the Trigger system. These are equipped with programmable delay generators for the distribution of control signals.

Proper trigger operation is guaranteed provided that algorithm execution on each board and data flow alongside the tree are synchronized. This is possible with a fine tuning of the skews of Roboclock CLK signals. We developed a tool to monitor the trigger synchronous operation: it checks both data transmission and memory addressing.

The system processes about 80 TB/s by means of all Type1 boards; the associated data transmission to the second layer is 30 GB/s.

## IV. Algorithm Firmware

The trigger logic is implemented by the use of FPGAs. The choice of an FPGA-based digital trigger makes it versatile, as it is possible to operate different selection criteria by easily reloading the configuration file onto the chip. The main frame of the logic is shown in Figure 2. The core consists in the



Fig. 2. FPGA firmware structure.

algorithm block, which is specific of each board depending on its position in the trigger hierarchy and on the input detector signals to deal with. In all, 8 different versions of the firmware have been developed for Type1 boards and 5 for Type2.

Selection algorithms are implemented by using both combinatorial and sequential logic. More tricky operations (such as bus multiplication, waveform interpolation and so on), which would else require fuzzy or time-consuming logic, can be performed instead in 1 CLK cycle by resorting to RAMbased Look Up Tables (LUTs). These are widely used in our firmware, as in the case of PMT gain compensation or in the matching of relative  $e^+ - \gamma$  direction patterns. Common to all firmware versions is a double stage of data storage, at the input/output of the algorithm block, which provides a powerful debugging tool to check both algorithm execution and board synchronization.

The system can generate up to 32 different trigger types ordered in a stack. The trigger for MEG signal events is assigned the highest priority, followed by MEG events with looser selection cuts. Trigger types used for single detector calibration and stability monitor are at the bottom of the stack. MEG events can be mixed with any other type with proper pre-scaling settings in order to compute signal efficiency and monitor detector stability during normal data acquisition. It is therefore desirable to program the fraction of each trigger type on a run-by-run basis. The content of each can be tuned by means of pre-scaling factors to be defined at the beginning of each run.

The choice of an FPGA chip as a platform follows the necessity of reducing the trigger latency as much as possible, which is mandatory in such a high-rate environment. By the use of a 100 MHz clock, we achieved an overall value  $\approx 400$  ns for the main  $\mu^+ \rightarrow e^+ \gamma$  trigger, including the time needed for data transmission through the trigger tree.



Fig. 3. Example of LXe summed waveform



Fig. 4. Online Energy resolution for a 55 MeV line

### V. TRIGGER PERFORMANCES

The system is now operating at PSI during the first Physics run of the experiment. In this section results are shown for the set-up of the  $\mu^+ \rightarrow e^+ \gamma$  trigger; an estimate of the trigger efficiency is provided for each observable being used. Resulting rates and livetime fraction are computed as well.

## A. $E_{\gamma}$ estimator

An efficient  $E_{\gamma}$  estimator is associated with the pulse height of LXe PMT signals, as resulting from the weighted sum of related waveforms. This requires an on-line calibration taking into account PMT+electronics gains, QE and PMT coverage. Figure 3 shows an example of such a waveform.

The Physics run followed a LXe calibration obtained with  $\gamma$ s from  $\pi^0$ -decays induced by a beam of negative pions undergoing charge-excharge reaction  $\pi^- p \rightarrow \pi^0 n$  on a liquid Hydrogen target. Events were collected upon coincidence of LXe with a NaI tag-detector located at the opposite side, where the energy of the 2  $\gamma$  is close to either edges (55 or 83 MeV) of the spectrum in the Lab frame. Lower energy  $\gamma$ s are particularly important as they allow us to study the LXe response function at an energy very close to the MEG signal. The on-line reconstructed spectrum obtained for those events in LXe is shown in Figure 4. With 9% FWHM resolution, it has been possible to set a threshold on  $E_{\gamma}$  discrimination at 45 MeV which guarantees an efficiency  $\epsilon_{E_{\gamma}} \geq 99\%$ .



Fig. 5. Online Energy Time Resolution

#### B. Time Coincidence

The on-line time reconstruction is based on a parabolic fit of the rising edge of LXe and TC PMT waveforms. The obtained values need to be calibrated according to the relative offsets, due to both cables and different algorythm latency, between the two detectors. To accomplish this task, the MEG experiment utilizes a proton beam delivered by a Cockcroft-Walton accelerator to induce radiative nuclear reactions on a boron-rich target, giving rise to an excited <sup>12</sup>C\* level which decays by emitting two cascade 4.4 and 11.7 MeV  $\gamma$ s. This provides an effective tool to study the relative timing of the two detectors. The resulting distribution is shown in Figure 5. The on-line resolution on  $\Delta T_{e^+\gamma}$  turned out to be better than 4 ns, while the offset was found to be 25 ns. This allowed us to set a 20 ns wide time-coincidence window, with an efficiency  $\epsilon_{\Delta T_{e^+\gamma}} \geq 99\%$ .

#### C. Direction Match

The long drift time of ionization electrons in the DC (of the order of a few hundreds ns) makes it incompatible with the request of short trigger latency and prevents us from using DC information for the reconstruction of positron direction at a trigger level. The selection of events with back-to-back  $e^+ - \gamma$  pair is therefore based on the correlation between the impinging point of the  $\gamma$  on the calorimeter and of  $e^+$  on the TC. Figure 6 shows the 95% CL hit-domain for such a positron as predicted by MonteCarlo simulation for  $\mu^+ \rightarrow e^+ \gamma$  events for 3 different directions of the  $\gamma$ , which are reconstructed by the position of the PMT collecting the largest number of photoelectrons. For each LXe PMT index, a LUT returns a set of indices of TC sectors compatible with the hit-domain of a 52.8 MeV positron emitted backward. The direction condition is matched whenever a hit on the TC lies in this domain. The values loaded in this LUT will be cross-checked with experimental data as soon as a significant statistics of events is collected.

## D. Trigger rate, DAQ and livetime

As already stated in Section I, the MEG background is expected to be dominated by accidentals. If so, the  $\mu^+ \rightarrow e^+\gamma$  trigger rate can be expressed by

$$R_{TRG} = R_{\gamma} \times R_{TC} \times f_{\theta} \times 2\Delta T \tag{1}$$



Fig. 6. Example of TC region inpinged by a  $e^+$  from  $\mu^+ \rightarrow e^+ \gamma$  decay in case of  $\gamma$  convering in front PMT #0 (top right corner of acceptance), PMT #103 (center of the acceptance), PMT #215 (bottom left corner of the acceptance) in the LXe calorimeter

where (1)  $R_{\gamma}$  is the expected  $\gamma$  rate over threshold,  $R_{TC}$ the e<sup>+</sup>-hit rate on the TC,  $f_{\theta}$  the rejection factor given by direction match and  $2\Delta T$  the time coincidence window. For a  $\mu$ -stop rate = 3 × 10<sup>7</sup>,  $\gamma$  energy threshold = 45 MeV,  $\Delta T$  = 10 ns and 95% efficiency on  $\theta_{\gamma e}$ ,  $R_{TRG}$  is expected to be  $\approx$  10 Hz. At the start of Physics run, September 2008, with the final configuration of  $\mu^+ \rightarrow e^+ \gamma$  trigger, we measured  $R_{TRG}$  = 6 Hz, close to expectation.

Data recorded on WFD cyclic memories are read-out by online DAQ clusters by means of the 2EVME transfer protocol. The deadtime per event of our DAQ system is approximately 40ms, corresponding to 83% livetime fraction, in agreement with experimental requests.

#### VI. CONCLUSIONS

The FPGA-based trigger of the MEG experiment is capable of performing a powerful background rejection with a 400 ns latency. The system is arranged in a multi-layer structure with 3 types of dedicated VME boards. The complete system is synchronous with a 100 MHz reference clock. On-line resolutions achieved on  $E_{\gamma}$ ,  $\Delta_T e^+ gamma$  and  $e^+ - \gamma$  relative direction are compatible with the request of suppressing the background at the level  $\approx 10$  Hz, while keeping the overall efficiency above 90%.

## REFERENCES

- [1] Baldini et al, The MEG Experiment: search for  $\mu^+ \rightarrow e^+ \gamma$  decay ay PSI, Proposal to INFN, September 2002.
- [2] MEGA Collaboration, Phys. Rev. D 65 (2002) 112002 http://arxiv.org/abs/hep-ex/0111030, hep-ex/0111030.
- [3] A. Baldini et al., "Liquid Xe scintillation calorimetry and Xe optical properties" http://arXiv:physics/0401072.
- [4] MEG Collaboration, "MUEGAMMA timing counter prototype test", PSI Annual Report (2001)
- [5] S. Ritt, contribution to the IEEE 2004 Conference, Nuclear Science Symposium available at http://ieeexplore.ieee.org/xpls/abs\_
  - all.jsp?arnumber=1462369
- [6] Analog Devices, AD9218, 10 bit 3V Dual A/D converter, http://www.analog.com
- [7] See documents and informations available at http://www.xilinx.com
- [8] Xilinx, "Virtex II Pro and Virtex II Pro X FPGA user's guide", UG12(v3.0) (2004) available at http://www.xilinx.com

- [9] Xilinx, "Virtex II Pro Platform FPGAs: Complete Data Sheet", DS083 (2004) available at http://www.xilinx.com
- [10] Xilinx, "'Virtex II Pro Platform FPGAs: Complete Data Sheet", DS083 (2004) http://www.xilinx.com
- [11] DS90CR481/DS90CR483 48-bit LVDS Channel Link SER/DES-65-112 MHz, http://www.national.com
- [12] Cypress, High-speed Multi-phase PLL clock Buffer, http://www.datasheetcatalog.com
- [13] Analog Devices, AD8138, Low Distorsion Differential ADC driver, http://www.analog.com
- [14] SaRonix, Crystal Clock Oscillator, http://www.pericom.com/saronix
- [15] Maxim, Low-Jitter 800 Mbps 10-Port LVDS Repeaters, http://www.maxim-ic.com