Added initial version of DB schema doc from the wiki

d56e01a2 · Brian Carrier · 358baac8 · d56e01a2 · d56e01a2 · d56e01a2
Commit d56e01a2 authored 4 years ago by Brian Carrier
--- a/bindings/java/doxygen/Doxyfile
+++ b/bindings/java/doxygen/Doxyfile
@@ -765,6 +765,7 @@ INPUT                  = main.dox \
                         artifact_catalog.dox \
                         insert_and_update_database.dox \
                         communications.dox \
+                         db_schema_8_6.dox \
 						 ../src
 # This tag can be used to specify the character encoding of the source files

--- a/bindings/java/doxygen/db_schema_8_6.dox
+++ b/bindings/java/doxygen/db_schema_8_6.dox
+/*! \page db_schema_8_6_page TSK & Autopsy Database Schema (Schema version 8.6)
+# Introduction
+This page outlines version 8.6 the database that is used by The Sleuth Kit and Autopsy. The goal of this page is to provide short descriptions for each table and column and not focus on foreign key requirements, etc. If you want that level of detail, then refer to the actual schema in addition to this. 
+You can find a basic graphic of some of the table relationships <a href="https://docs.google.com/drawings/d/1omR_uUAp1fQt720oJ-kk8C48BXmVa3PNjPZCDdT0Tb4/edit?usp#sharing">here</a>
+Some general notes on this schema:
+- Nearly every type of data is assigned a unique ID, called the Object ID
+- The objects form a hierarchy, that shows where data came from.  A child comes from its parent.  
+ - For example, disk images are the root, with a volume system below it, then a file system, and then files and directories. 
+- This schema has been designed to store data beyond the file system data that The Sleuth Kit supports. It can store carved files, a folder full of local files, etc.
+- The Blackboard is used to store artifacts, which contain attributes (name/value pairs).  Artifacts are used to store data types that do not have more formal tables. Module writers can make whatever artifact types they want. See \ref mod_bbpage for more details. 
+- The Sleuth Kit will make virtual files to span the unallocated space.  They will have a naming format of 'Unalloc_[PARENT-OBJECT-ID]_[BYTE-START]_[BYTE-END]'.
+# General Information Tables 
+## tsk_db_info 
+Metadata about the database.
+- **schema_ver** - Version of the database schema used to create database (must be 2 in this case)
+- **tsk_ver** - Version of TSK used to create database
+## tsk_db_info_extended
+Name & Value pair table to store any information about the database.  For example, which schema it was created with. etc. 
+- **name** - Any string name
+- **value** - Any string value
+# Object Tables 
+## tsk_objects 
+Every object (image, volume system, file, etc.) has an entry in this table.  This table allows you to find the parent of a given object and allows objects to be tagged and have children.  This table provides items with a unique object id.  The details of the object are in other tables.  
+- **obj_id** - Unique id 
+- **par_obj_id** - The object id of the parent object (null for root objects). The parent of a volume system is an image, the parent of a directory is a directory or filesystem, the parent of a filesystem is a volume or an image, etc.
+- **type** - Object type (as org.sleuthkit.datamodel.TskData.ObjectType enum).
+# Data Source/Device Tables 
+## data_source_info
+Contains information about a data source, which could be an image.  This is where we group data sources into devices (based on device ID)
+- **obj_id** - Id of image/data source in tsk_objects
+- **device_id** - Unique ID (GUID) for the device that contains the data source. 
+- **time_zone** - Timezone that the data source was originally located in. 
+* Disk Image Tables
+## tsk_image_info 
+Contains information about each set of images that is stored in the database. 
+- **obj_id** - Id of image in tsk_objects
+- **type** - Type of disk image format (as org.sleuthkit.datamodel.TskData.TSK_IMG_TYPE_ENUM)
+- **ssize** - Sector size of device in bytes
+- **tzone** - Timezone where image is from (the same format that TSK tools want as input)
+- **size** - Size of the original image (in bytes) 
+- **md5** - Hash of the image.  Currently, this is populated only if the input image is E01. 
+- **display_name** - display name of the image. 
+## tsk_image_names
+Stores path(s) to file(s) on disk that make up an image set.
+- **obj_id** - Id of image in tsk_objects
+- **name** - Path to location of image file on disk
+- **sequence** - Position in sequence of image parts
+# Volume System Tables
+## tsk_vs_info
+Contains one row for every volume system found in the images.
+- **obj_id** - Id of volume system in tsk_objects
+- **vs_type** - Type of volume system / media management (as org.sleuthkit.datamodel.TskData.TSK_VS_TYPE_ENUM])
+- **img_offset** - Byte offset where VS starts in disk image
+- **block_size** - Size of blocks in bytes
+## tsk_vs_parts
+Contains one row for every volume / partition in the images. 
+- **obj_id** - Id of volume in tsk_objects
+- **addr** - Address of this partition
+- **start** - Sector offset of start of partition
+- **length** - Number of sectors in partition
+- **desc** - Description of partition (volume system type-specific)
+- **flags** - Flags for partition (as org.sleuthkit.datamodel.TskData.TSK_VS_PART_FLAG_ENUM])
+## tsk_pool_info 
+Contains information about pools (for APFS, logical disk management, etc.)
+- TODO: Fill in columns
+# File System Tables
+## tsk_fs_info
+Contains one for for every file system in the images. 
+- **obj_id** - Id of filesystem in tsk_objects
+- **img_offset** - Byte offset that filesystem starts at
+- **fs_type** - Type of file system (as org.sleuthkit.datamodel.TskData.TSK_FS_TYPE_ENUM])
+- **block_size** - Size of each block (in bytes)
+- **block_count** - Number of blocks in filesystem
+- **root_inum** - Metadata address of root directory
+- **first_inum** - First valid metadata address
+- **last_inum** - Last valid metadata address
+- **display_name** - Display name of file system (could be volume label) (New in V3)
+## tsk_files
+Contains one for for every file found in the images.  Has the basic metadata for the file. 
+- **obj_id** - Id of file in tsk_objects
+- **fs_obj_id** - Id of filesystem in tsk_objects (NULL if file is not located in a file system -- carved in unpartitioned space, etc.)
+- **type** - Type of file: filesystem, carved, etc. (as org.sleuthkit.datamodel.TskData.TSK_DB_FILES_TYPE_ENUM enum)
+- **attr_type** - Type of attribute (as org.sleuthkit.datamodel.TskData.TSK_FS_ATTR_TYPE_ENUM)
+- **attr_id** - Id of attribute
+- **name** - Name of attribute. Will be NULL if attribute doesn't have a name.  Must not have any slashes in it. 
+- **meta_addr** - Address of the metadata structure that the name points to.
+- **meta_seq** - Sequence of the metadata address - New in V3
+- **has_layout** - True if file has an entry in tsk_file_layout
+- **has_path** - True if file has an entry in tsk_files_path
+- **dir_type** - File type information: directory, file, etc. (as org.sleuthkit.datamodel.TskData.TSK_FS_NAME_TYPE_ENUM)
+- **meta_type** - File type (as org.sleuthkit.datamodel.TskData.TSK_FS_META_TYPE_ENUM)
+- **dir_flags** -  Flags that describe allocation status etc. (as org.sleuthkit.datamodel.TskData.TSK_FS_NAME_FLAG_ENUM)
+- **meta_flags** - Flags for this file for its allocation status etc. (as org.sleuthkit.datamodel.TskData.TSK_FS_META_FLAG_ENUM)
+- **size** - File size in bytes
+- **ctime** - Last file / metadata status change time (stored in number of seconds since Jan 1, 1970 UTC)
+- **crtime** - Created time
+- **atime** - Last file content accessed time
+- **mtime** - Last file content modification time
+- **mode** - Unix-style permissions (as org.sleuthkit.datamodel.TskData.TSK_FS_META_MODE_ENUM)
+- **uid** - Owner id
+- **gid** - Group id
+- **md5** - MD5 hash of file contents
+- **known** - Known status of file (as org.sleuthkit.datamodel.TskData.FileKnown)
+- **parent_path** - full path of parent folder. Must begin and end with a '/' (Note that a single '/' is valid).
+- **mime_type** - MIME type of the file content, if it has been detected. 
+## tsk_file_layout
+Stores the layout of a file within the image.  A file will have one or more rows in this table depending on how fragmented it was. All file types use this table (file system, carved, unallocated blocks, etc.).
+- **obj_id** - Id of file in tsk_objects
+- **sequence** - Position of this run in the file (0-based and the obj_id and sequence pair will be unique in the table)
+- **byte_start** - Byte offset of fragment relative to the start of the image file
+- **byte_len** - Length of fragment in bytes
+## tsk_files_path
+If a "locally-stored" file has been imported into the database for analysis, then this table stores its path.  Used for derived files and other files that are not directly in the image file.
+- **obj_id** - Id of file in tsk_objects
+- **path** - Path to where the file is locally stored in a file system.
+- **encoding_type** - Method used to store the file on the disk. 
+## file_encoding_types 
+Methods that can be used to store files on local disks to prevent them from being quarantined by antivirus
+- **encoding_type** - ID of method used to store data.  See org.sleuthkit.datamodel.TskData.EncodingType enum. 
+- **name** -  Display name of technique. 
+## tsk_files_derived_method
+Derived files are those that result from analyzing another file.  For example, files that are extracted from a ZIP file will be considered derived.  This table keeps track of the derivation techniques that were used to make the derived files. 
+NOTE: This table is not used in any code.
+- **derived_id** - Unique id for this derivation method. 
+- **tool_name** - Name of derivation method/tool
+- **tool_version** - Version of tool used in derivation method
+- **other** - Other details
+## tsk_files_derived
+Each derived file has a row that captures the information needed to re-derive it
+NOTE: This table is not used in any code.
+- **obj_id** - Id of file in tsk_objects
+- **derived_id** - Id of derivation method in tsk_files_derived_method
+- **rederive** - Details needed to re-derive file (will be specific to the derivation method)
+# Blackboard Tables 
+The \ref mod_bbpage is used to store results from analysis modules. 
+## blackboard_artifacts
+Stores artifacts associated with objects.
+- **artifact_id** - Id of the artifact (assigned by the database)
+- **obj_id** - Id of the associated object
+- **artifact_type_id** - Id for the type of artifact (can be looked up in the blackboard_artifact_types table)
+## blackboard_attributes
+Stores name value pairs associated with an artifact. Only one of the value columns should be populated
+- **artifact_id** - Id of the associated artifact.
+- **source** - Source string, should be module name that created the entry.
+- **context** - Additional context string
+- **attribute_type_id** - Id for the type of attribute (can be looked up in the blackboard_attribute_types)
+- **value_type** - The type of value (0 for string, 1 for int, 2 for long, 3 for double, 4 for byte array)
+- **value_byte** - A blob of binary data (should be empty unless the value type is byte)
+- **value_text** - A string of text (should be empty unless the value type is string)
+- **value_int32** - An integer (should be 0 unless the value type is int)
+- **value_int64** - A long integer (should be 0 unless the value type is long)
+- **value_double** - A double (should be 0.0 unless the value type is double)
+## blackboard_artifact_types
+Types of artifacts
+- **artifact_type_id** - Id for the type (this is used by the blackboard_artifacts table)
+- **type_name** - A string identifier for the type (unique)
+- **display_name** - A display name for the type (not unique, should be human readable)
+## blackboard_attribute_types
+Types of attribute
+- **attribute_type_id** - Id for the type (this is used by the blackboard_attributes table)
+- **type_name** - A string identifier for the type (unique)
+- **display_name - A display name for the type (not unique, should be human readable)
+# Communication Accounts
+TODO
+\ref mod_compage
+## accounts
+## account_types
+## account_relationships
+# Timeline
+TODO
+## tsk_event_types
+## tsk_event_descriptions
+## tsk_events
+# Examiners and Reports
+TODO
+## tsk_examiners
+## reports
+# Tags 
+## tag_names table
+Defines what tag names the user has created and can therefore be applied.
+- tag_name_id - Unique ID for each tag name
+- display_name - Display name of tag
+- description  - Description  (can be empty string)
+- color - Color choice for tag (can be empty string)
+## tsk_tag_sets
+TODO
+## content_tags table
+One row for each file tagged.  
+- tag_id - unique ID
+- obj_id - object id of Content that has been tagged
+- tag_name_id - Tag name that was used
+- comment  - optional comment 
+- begin_byte_offset - optional byte offset into file that was tagged
+- end_byte_offset - optional byte ending offset into file that was tagged
+## blackboard_artifact_tags table
+One row for each artifact that is tagged.
+- tag_id - unique ID
+- artifact_id - Artifact ID of artifact that was tagged
+- tag_name_id - Tag name that was used
+- comment - optional comment
+# Ingest Module Status
+These tables keep track in Autopsy which modules were run on the data sources.
+TODO
+## ingest_module_types table
+Defines the types of ingest modules supported. 
+- type_id 
+- type_name 
+## ingest_modules
+Defines which modules were installed.  One row for each module. 
+- ingest_module_id 
+- display_name 
+- unique_name 
+- type_id 
+- version 
+## ingest_job_status_types table
+- type_id 
+- type_name 
+##  ingest_jobs
+One row is created each time ingest is started, which is a set of modules in a pipeline. 
+- ingest_job_id 
+- obj_id 
+- host_name 
+- start_date_time 
+- end_date_time 
+- status_id 
+- settings_dir 
+##  ingest_job_modules
+Defines the order of the modules in a given pipeline (i.e. ingest_job)
+- ingest_job_id 
+- ingest_module_id 
+- pipeline_position 
+*/
--- a/bindings/java/doxygen/main.dox
+++ b/bindings/java/doxygen/main.dox
@@ -40,6 +40,12 @@ You can also access the data in its tree form by starting with org.sleuthkit.dat
 - \subpage mod_bbpage is where analysis modules (such as those in Autopsy) can post and save their results. 
 - The \subpage artifact_catalog_page gives a list of the current artifacts and attributes used on \ref mod_bbpage.
 - \subpage mod_compage is where analysis modules can store and retrieve communications-related data. 
+\section main_db Database Topics
+The Sleuth Kit has its own database schema that is shared with Autopsy and other tools. The primary way it gets populated is via the Java code. 
+- Database Schema Documentation:
+ - \subpage db_schema_8_6_page 
 - Refer to \subpage query_database_page if you are going to use one of the SleuthkitCase methods that requires you to specify a query. 
 - Refer to \subpage insert_and_update_database_page if you are a Sleuth Kit developer and want to avoid database issues.