= Fault System Solution Zip File Format Documentation = UCERF3 (and potentially other forecast) data is stored in a zip file format, this page describes how to parse these zip files. If you are using Java, a parser is already written in OpenSHA via the [source:trunk/dev/scratch/UCERF3/utils/FaultSystemIO.java scratch.UCERF3.utils.FaultSystemIO] class. == Zip File Contents == The following files constitute a Fault System Solution. See File Formats below for descriptions and implementation details for each format. ||= File Name =||= File Format =||= Optional? =||= Description =|| ||'''fault_sections.xml'''||[#FaultsectiondataXMLfile XML]||no||This XML file describes each sub section in the Fault System. These indexes will be referred to in the rup_sections.bin file when defining ruptures.|| ||'''grid_sources.xml'''||[#GridSourcesXMLfile XML]||yes||This XML file, if present, gives gridded seismicity MFDs at each node in the region that this solution covers.|| ||'''grid_sources_reg.xml'''||[#GridSourcesXMLfile XML]||yes||This XML file, if present, gives the region associated with gridded seismicity MFDs. Used in conjunction with '''grid_sources.bin''' and a more space efficient alternative to '''grid_sources.xml'''|| ||'''grid_sources.bin'''||[#GridSourcesBinaryfile Double array list binary]||yes||This binary file, if present, gives gridded seismicity MFDs at each node in the region described in '''grid_sources_xml'''.|| ||'''info.txt'''||ASCII||yes||This text file, if present, contains metadata describing the solution.|| ||'''mags.bin'''||[#Doublearraybinaryfile Double array binary]||no||This file gives magnitudes for each rupture. It contains one double value for each rupture index, in order.|| ||'''rakes.bin'''||[#Doublearraybinaryfile Double array binary]||no||This file gives average rakes for each rupture. It contains one double value for each rupture index, in order.|| ||'''rates.bin'''||[#Doublearraybinaryfile Double array binary]||no||This file gives annualized rates for each rupture. It contains one double value for each rupture index, in order.|| ||'''rup_areas.bin'''||[#Doublearraybinaryfile Double array binary]||no||This file gives areas for each rupture in SI units (square meters). It contains one double value for each rupture index, in order.|| ||'''rup_lengths.bin'''||[#Doublearraybinaryfile Double array binary]||yes||This file, if present, gives lengths for each rupture in SI units (meters). It contains one double value for each rupture index, in order.|| ||'''rup_mfds.bin'''||[#MFDBinaryFile MFD Binary]||yes||This file, if present, gives magnitude frequency distributions for each rupture. It contains one function (consisting of an x value and y value data array) for each rupture index, in order.|| ||'''rup_sections.bin'''||[#Integerarraylistbinaryfile Integer array list binary]||no||This lists the sub sections involved in each rupture. It consists of numRuptures arrays, each of which lists the sub sections indexes (as defined in fault_sections.xml) for that rupture|| ||'''sect_areas.bin'''||[#Doublearraybinaryfile Double array binary]||yes||This file, if present, gives areas for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.|| ||'''sect_slips.bin'''||[#Doublearraybinaryfile Double array binary]||yes||This file, if present, gives slip rates after any aseismic and subseismogenic reductions for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.|| ||'''sect_slips_std_dev.bin'''||[#Doublearraybinaryfile Double array binary]||yes||This file, if present, gives standard deviations of slip rates after any aseismic and subseismogenic reductions for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.|| ||'''sub_seismo_on_fault_mfds.bin'''||[#MFDBinaryFile MFD Binary]||yes||This file, if present, gives subseismogenic magnitude frequency distributions for each sub section. It contains one function (consisting of an x value and y value data array) for each sub section index, in order.|| The following files are neither documented nor required but may be present in zip files generated by the UCERF3 inversion. They give metadata about the logic tree branch associated and other inversion metadata. ||= File Name =||= File Format =||= Optional? =||= Description =|| ||'''close_sections.bin'''||[#Integerarraylistbinaryfile Integer array list binary]||yes||This file lists, for each sub section (in order), all of the other sub sections that the given sub section connects with in the Fault System.|| ||'''cluster_rups.bin'''||[#Integerarraylistbinaryfile Integer array list binary]||yes||Some fault systems are separated into clusters of interconnected faults. This file lists, for each cluster, all of the rupture indexes which are part of the given cluster.|| ||'''cluster_sects.bin'''||[#Integerarraylistbinaryfile Integer array list binary]||yes||Some fault systems are separated into clusters of interconnected faults. This file lists, for each cluster, all of the sub section indexes which are part of the given cluster.|| ||'''inv_rup_set_metadata.xml'''||XML||yes||This file gives metadata for the logic tree branch and rupture filtration criterion (laugh test filter) used to generate this solution.|| ||'''inv_sol_metadata.xml'''||XML||yes||This file gives metadata for the UCERF3 inversion including equation set weights and final simulated annealing energies.|| ||'''rup_avg_slips.bin'''||[#Doublearraybinaryfile Double array binary]||yes||This file, if present, gives the average slip for each rupture in SI units (meters). It contains one double value for each rupture index, in order.|| == File Formats Used == You must write a parser for each of the following file formats in order to load in a fault system solution. === Double array binary file === These files contain an array of double values in a binary format. These files simply contain a series of big endian 8 bit double precision floating point numbers. The size of this file will be equal to the number of values x 8 bits. === Integer array list binary file === These files contain a list of integer arrays in a binary format. All file entries are 4-bit big endian integer values. The first value in the file is the number of integer arrays stored in the file. Then each array is written to the file by first writing the number of elements in the array, then each value in the array. For example, consider the following 3 arrays: {{{ [ 0 6 2 4 ] [ 3 6 2 ] [ 3 7 9 1 4 7 ] }}} This would be written as (all stored as big endian 4-bit integers): '''[[span(style=color: #FF0000, 3)]] [[span(style=color: #0000FF, 4)]] 0 6 2 4 [[span(style=color: #0000FF, 3)]] 3 6 2 [[span(style=color: #0000FF, 6)]] 3 7 9 1 4 7''' In this example, '''[[span(style=color: #FF0000, the number of arrays is in red)]]''', '''[[span(style=color: #0000FF, each array's size in blue)]]''', '''and array data is in black'''. === Fault section data XML file === Each fault subsection is stored in an XML file, an example of which is shown below. {{{ ... ... ... }}} === Grid Sources XML file === Some solutions will contain gridded seismicity magnitude frequency distributions. Here is an example XML file: ''NOTE 1: UCERF3 uses the RELM region evenly discretized at 0.1 degrees for gridded seismicity. Due to the complexities involved in reproducing our gridding exactly, a file is posted here with grid node indexes and locations for this region: http://opensha.usc.edu/ftp/kmilner/ucerf3/relm_gridded_region.csv'' ''NOTE 2: This file is now deprecated as it is very large and does not compress well. The newer version of the file, grid_sources.xml, just contains the evenlyGriddedGeographicRegion region element below and uses a [#GridSourcesBinaryfile binary format].'' {{{ ... ... ... ... }}} === Grid Sources Binary file === This is a binary representation of grid source MFDs. All values are stored in a binary format (8-bit big endian floating point values) as a list of double arrays. First, the number of total arrays is written, this is two times the number of grid nodes plus one, for the x values (which are only written once). The multiple of two is because each node has both an unassociated MFD (not associated with any faults) and an associated (associated with a fault) MFD. For example, the 7636 grid nodes used for UCERF3 would write (7636 * 2) + 1 = 15273 arrays. Then each array is written first with a 4-bit integer for the size of the array, followed by each 8-bit big endian value in the array. Empty arrays (size zero) mean that there is no MFD at that node (for example, many nodes do not have any faults and do not have an unassociated MFD). Lets consider a simple example with 2 grid nodes where one associated MFD is null (note that actual grid source MFDs are discretized more finely): '''Node 1:''' Unassociated: ||= x =||= y =|| ||5.0||0.5|| ||5.5||0.1|| ||6.0||1e-2|| ||6.5||3e-5|| ||7.0||1e-8|| ||7.5||1e-11|| Associated sub seismogenic: null '''Node 2:''' Unassociated: ||= x =||= y =|| ||5.0||0.4|| ||5.5||0.2|| ||6.0||2e-2|| ||6.5||3e-5|| ||7.0||2e-8|| ||7.5||1e-10|| Associated sub seismogenic: ||= x =||= y =|| ||5.0||0.2|| ||5.5||0.1|| ||6.0||3e-2|| ||6.5||7e-5|| ||7.0||4e-8|| ||7.5||6e-11|| These would be written to the file as: '''[[span(style=color: #FF0000, 5)]] [[span(style=color: #0000FF, 6)]] [[span(style=color: #00FFFF, 5.0 5.5 6.0 6.5 7.0 7.5)]] [[span(style=color: #0000FF, 6)]] [[span(style=color: #FFBF00, 0.5 0.1 1e-2 3e-5 1e-8 1e-11)]] [[span(style=color: #0000FF, 0)]] [[span(style=color: #0000FF, 6)]] [[span(style=color: #FFBF00, 0.4 0.2 2e-2 3e-5 2e-8 1e-10)]] [[span(style=color: #0000FF, 6)]] [[span(style=color: #0B610B, 0.2 0.1 3e-2 7e-5 4e-8 6e-11)]]''' In this example, '''[[span(style=color: #FF0000, the number of arrays ((2 * the number of grid nodes) + 1), 4-bit integer, is in red)]]''', '''[[span(style=color: #0000FF, each array's size (integer) in blue)]]''', '''[[span(style=color: #00FFFF, x value array data (double values) are in cyan)]]''', '''[[span(style=color: #FFBF00, y value array data (double values) for unassociated MFDs are in orange)]]''', and '''[[span(style=color: #0B610B, y value array data (double values) for associated sub seismogenic MFDs are in green)]]'''. === MFD Binary File === Some mean (across multiple logic tree branches) solutions may contain Magnitude Frequency Distributions for each rupture. In this case, the rates.bin file will contain total rates and mags.bin will contain weighted average magnitudes. These MFDs can be used to more accurately represent the mean of multiple solutions instead of using the mean magnitude. Additionally, solutions can optionally include subseismogenic magnitude frequency distributions for each fault subsection. These are not needed for most applications, but can be used instead of the "associated" MFDs provided in the gridded seismicity data files. They are written as a series of double arrays, with x values and y values separated into individual arrays. For example, consider these two functions: function 1: ||= x =||= y =|| ||5.5||0.1|| ||5.75||0.3|| ||5.9||0.2|| function 2: ||= x =||= y =|| ||5.5||0.05|| ||5.75||0.33|| ||5.9||0.24|| ||6.21||0.1|| These would be written to the file as: '''[[span(style=color: #FF0000, 4)]] [[span(style=color: #0000FF, 3)]] [[span(style=color: #00FFFF, 5.5 5.75 5.9)]] [[span(style=color: #0000FF, 3)]] [[span(style=color: #FFBF00, 0.1 0.3 0.2)]] [[span(style=color: #0000FF, 4)]] [[span(style=color: #00FFFF, 5.5 5.75 5.9 6.21)]] [[span(style=color: #0000FF, 4)]] [[span(style=color: #FFBF00, 0.05 0.33 0.24 0.1)]]''' In this example, '''[[span(style=color: #FF0000, the number of arrays (2 * the number of functions, integer) is in red)]]''', '''[[span(style=color: #0000FF, each array's size (integer) in blue)]]''', '''[[span(style=color: #00FFFF, x value array data (double values) are in cyan)]]''', and '''[[span(style=color: #FFBF00, y value array data (double values) are in orange)]]'''. == Compound Fault System Solution Files == Compound Fault System Solution files are single zip files which contain all data for solutions for multiple UCERF3 Logic Tree Branches. This format takes advantage of the fact that many contain duplicate information, so that file is only written once. For example, rakes only depend on the Fault Model and Deformation Model (they are not, for example, dependent on the Spatial Seismicity Kernel). So one 'rakes.bin' file is stored for each combination of FM/DM, for example, "FM3_1_ZENGBB_rakes.bin'. The 'rates.bin' files, however, are unique to each logic tree branch and one is present for each branch. See the table below for a mapping the logic tree branch choices that each file type depends on. ||= File Name =||= Logic Tree Branch Levels =|| ||close_sections.bin||FM|| ||cluster_rups.bin||FM|| ||cluster_sects.bin||FM|| ||fault_sections.xml||FM, DM|| ||info.txt||ALL|| ||mags.bin||FM, DM, Scale|| ||rakes.bin||FM, DM|| ||rates.bin||ALL|| ||rup_areas.bin||FM, DM|| ||rup_lengths.bin||FM|| ||rup_avg_slips.bin||FM, DM, Scale|| ||rup_sec_slip_type.txt||N/A|| ||rup_sections.bin||FM|| ||sect_areas.bin||FM, DM|| ||sect_slips.bin||ALL BUT Dsr|| ||sect_slips_std_dev.bin||ALL BUT Dsr|| ||inv_rup_set_metadata.xml||ALL|| ||inv_sol_metadata.xml||ALL|| ||grid_sources.xml||ALL // old xml format|| ||grid_sources_reg.xml||NONE // new binary format|| ||grid_sources.bin||ALL // new binary format|| ||rup_mfds.bin||ALL||