DICOM Research

Defintion

DICOM which stands for " Digital Imaging and Communication in Medicine " is a document which defines a method of communication for the various equipment of digital medical imaging "modalities". This standard is now in use by the majority of medical imaging hardware manufacturers. Thus, the standard makes it possible for the equipment to communicate remotely through a network or a media (disk or tapes ). By ensuring the compatibility of the equipment and by eliminating proprietary formats.

Goals

The goal is to put the images of the patient and all important information associated with the patient, in an format which allows for easy interconnection and interaction of medical imaging equipment with the data.

Current version

The current version includes/understands more than 12 documents :
No PS 3.1-1992 < - Dicom 3 - Overview & Introduction
No PS 3.8-1992 < - Dicom 3 - Network Communication Support

No PS 3.2-1993 < - Dicom 3 - Conformance
No PS 3.3-1993 < - Dicom 3 - Information Object Definitions
No PS 3.4-1993 < - Dicom 3 - Service Class Specifications
No PS 3.5-1993 < - Dicom 3 - Dated Encoding & Structures
No PS 3.6-1993 < - Dicom 3 - Dated Dictionary
No PS 3.7-1993 < - Dicom 3 - Exchange Message
No PS 3.9-1993 < - Dicom 3 - Point-to-Point Communication
No PS 3.10-1995 < - Dicom 3 - Storage Media & Spins Format
No PS 3.11-1995 < - Dicom 3 - Media Storage Application Profile
No PS 3.12-1995 < - Dicom 3 - Media Formats & Physical Media


An introduction to the DICOM file format

The Digital Imaging and Communications in Medicine (DICOM) standard was created by the National Electrical Manufacturers Association (NEMA) to aid the distribution and viewing of medical images, such as CT scans, MRIs, and ultrasound. A single DICOM file contains both a header (which stores information about the patient's name, the type of scan, image dimensions, etc), as well as all of the image data (which can contain information in three dimensions). DICOM is the most common standard for receiving scans from a hospital.


The DICOM header

The Image on the left shows a hypothetical DICOM image file. In this example, the first 794 bytes are used for a DICOM format header, which describes the image dimensions and retains other text information about the scan. The size of this header varies depending on how much header information is stored. Here, the header defines an image, which has the dimensions 109x91x2 voxels, with a data resolution of 1 byte per voxel (so the total image size will be 19838). The image data follows the header information (the header and the image data are stored in the same file).

Further down, I show a more detailed list of the DICOM header. Note that DICOM requires a 128-byte preamble (these 128 bytes are usually all set to zero), followed by the letters 'D', 'I', 'C', 'M'. This is followed by the header information, which is organized in 'groups'. For example, the group 0002 hex is the file meta information group, and (in the example on the left) contains 3 elements: one defines the group length, one stores the file version and the third stores the transfer syntax.


The DICOM elements required depend on the image type, and are listed in Part 3 of the DICOM standard. For example, this image modality is 'MR' (see group:element 0008:0060), so it should have elements to describe the MRI echo time. The absence of this information in this image is a violation of the DICOM standard. In practice, most DICOM format viewers (including ezDICOM) do not check for the presence of most of these elements, extracting only the header information, which describes the image size.

Of particular importance is group:element 0002:0010. This defines the 'Transfer Syntax Unique Identification' (see the table below). The Transfer Syntax UID reports the byte order for raw data. Different computers store integer values differently, so called 'big endian' and 'little endian' ordering. Consider a 16-bit integer with the value 257: the most significant byte stores the value 01 (=255), while the least significant byte stores the value 02. Some computers would save this value as 01:02, while others will store it as 02:01. Therefore, for data with more than 8-bits per sample, a DICOM viewer may need to swap the byte-order of the data to match the ordering used by your computer.

In addition to the Transfer Syntax UID, the image is also specified by the Samples Per Pixel (0028:0002), Photometric Interpretation (0028:0004), the Bits Allocated (0028:0100). For most MRI and CT images, the photometric interpretation is a continuous monochrome (e.g. typically depicted with pixels in grayscale). In DICOM, these monochrome images are given a photometric interpretation of 'MONOCHROME1' (low values=bright, high values=dim) or 'MONOCHROME2' (low values=dark, high values=bright). However, many ultrasound images and medical photographs include color, and these are described by different photometric interpretations (e.g. Palette, RGB, CMYK, YBR, etc). Some colour images (e.g. RGB) store 3-samples per pixel (one each for red, green and blue), while monochrome and paletted images typically store only one sample per image. Each images store 8-bits (256 levels) or 16-bits per sample (65,535 levels), though some scanners save data in 12-bit or 32-bit resolution. So a RGB image that stores 3 samples per pixel at 8-bits per can potentially describe 16 million colours (256 cubed).

Transfer Syntax UID
Definition

1.2.840.10008.1.2

Raw data, Implicit VR, Little Endian

1.2.840.10008.1.2.x

Raw data, Eplicit VR x = 1: Little Endian x = 2: Big Endian

1.2.840.10008.1.2.4.xx

JPEG compression xx = 50-64: Lossy JPEG xx = 65-70: Lossless JPEG

1.2.840.10008.1.2.5

Lossless Run Length Encoding



A More Detailed Look at the Header

Preamble - The first 128 byte of the DICOM image file header is called as file's Preamble

Prefix - The 129th byte starts with 'D followed by 'I', 'C', 'M' is called prefix.

UID - Unique Identifier, which is a string of numbers and periods with a unique root for each organization that is registered with ISO. It tells you about the transfer syntax and byte ordering For e.g., "1.2.840.10008.1.2" is a UID that's defined as an "Implicit VR Little Endian"

here
1 :- indicates ISO
2 :- indicates ISO member body branch
840 :- indicates the member body country code
10008 :- is registered by ANSI to NEMA for DICOM UIDs.
Transfer Syntax - It could be a Little Endian or a Big Endian.

Little Endian - A type of Byte Ordering. The most significant byte is put on the right and so the Least significant byte is pushed to the left e.g., Consider this 16-bit binary value read from a DICOM file as is:
11111111 00000000
(first byte) (second byte)

This would tempt us to judge the value to be 65280 (that's the decimal equivalent of the binary number). But it's NOT ! The correct value should be interpreted as second byte first and then the first byte. i.e.,
00000000 11111111
(second byte) (first byte)
the decimal value would be 255. Little Endian is the default Transfer Syntax.

Big Endian - Another type of Byte Ordering. Opposite of Little Endian. Most Significant Byte is on the left and the Least Significant byte is on the right. The Decimal equivalent of 11111111 00000000 is 65280 !

Tag - DICOM tag (e.g., 0010,0010) consist of two parts
1. Group - Indicates the type of information. e.g., 0010 indicates that information is about patient.
2. Element - Uniquely identifies a data. e.g., 0010 indicates the patient name.
Value Representation - VR of the tag describes the data type of the value. Can be Explicit or an Implicit. If Explicit, Data type is mentioned. If Implicit, you will not find the data type mentioned

Value Length - Occupies 4 Bytes represents the length of the value. e.g., If the value of the Patient name tag (0010,0010) were 'Roentgen' Value Length in this case would be 8.

Value Field - An even number of bytes containing the values of the command element. Nothing but the value itself !

Value Multiplicity - Number of values encoded in the Value Field. For eg. if the VM is 3, it indicates, there are 3 values in the value field.

SCP - Service class Provider. The role played by the DICOM application or AE that performs the operation. For instance, if you are pulling the images from the console to your PC, your PC (your DICOM Application or AE) is the Service class provider. What service does is it provide? to whom? It stores the images sent by the console. that 's the service it provides to the console, which is a SCU here!

SCU - Service Class User. The role played by the DICOM application or AE, which invoked the operation In the above said example, console is the SCU. It makes use of the storage service provided by SCP (Your PC), and hence the name!

Association - A communication connection established between two DICOM applications by which DICOM information is exchanged. A device may support one or more associations simultaneously.

Packet - A small (usually) portion of a larger message that is being communicated. In addition to the message fragment, the packet has header information that allows it to be sent to the correct location and to be put in correct order should the multiple packets of a message arrive out of sequence. The packet also usually contains information that allows a communication system to determine if it got corrupted on the way to its destination.

PDU - Protocol Data Unit. Atomic unit of the message you send.

PACS - Picture Archival and communication system.

DICOM Specifics for Programming

In broad terms, a DICOM image file consists of:
1. Header Information
2. Image Pixel data.

See below the table.
Preamble

First 127 bytes contain blank area called preamble. This is a comment area.

Prefix

DICM is the prefix that identifies that this file is a DICOM image file. Strictly speaking, all the DICOM files must possess this preamble & prefix.

DATA ELEMENTS (Tag,VR, VL,Values)

A Data Element Consists of Tag, VR-Value Representation (optional), VL - Value length and the actual Value. Any DICOM Tag contains a Group number followed by an Element Number. For e.g., In the tag, '0010 0010', the first 4 digits '0010' is the group no. that tells you that the following information is about 'Patient'. The last four digit, '0010' pinpoints the kind of information. Here, '0010' is the Patient Name.

DATA ELEMENTS (Tag,VR, VL, Values)

It is the Data elements all the way down...

Pixel Data

And then comes the pixel data. Pixel data is nothing but the image. The image's pixel values are stored here. This pixel data array is followed by the a unique tag '7FE0 0010'. which tells you that it is Pixel data values.

All the information about the image, patient, study, etc. are stored in the header.
If you are writing a program to display the image, you need to read this header to obtain the necessary information about it.
In DICOM jargon, the entire DICOM image file is called a Dataset.

A Dataset consists of:

1. Tag - Uniquely identifies an information A tag is a combination of Group No. And Element Number.

For Example, In the tag (0010,0020): Group No. is 0010 and Element No. is 0020
A Group No. Tells you about an entity. An Element No. Identifies the exact information in the group.
For example:
Group no. 0010 tells you that the information is about PATIENT. Element No. 0020 tells you that the information is about the PATIENT NAME.

Some Groups: (Group numbers are in Hexadecimal)
Group 2 : Contains File Meta information
Group 8 : General series info.
Group 10: Patient info.
Group 20 : General Study info.
Group 28 : Image info.
Group 7F : Image pixel data

2. VR (Value Representation)- Tells you the data type of the value. It is an optional field.

3. VL - (Value Length) - Length of the value.

4. Value Field - Here is where the actual value is stored

The following figure illustrates how the data is stored.



In the figure above, you can see the data elements consisting of TAG, VR, Value Length and Value field.

Tag - An Hex value, that uniquely identifies an information
VR (Value Representation) - Optional field.
You can see VR only in an Explicit VR transfer syntax file.
If the transfer syntax is Implicit VR, then VR field is empty.
VR is a 2-byte field. It denotes the data type of the value.
VL (Value Length) - Gives you the length in bytes of the following value.

4 bytes
2 bytes
2 bytes
C bytes (in hex) or 12 bytes
0010 0010
PN
0C
Thomas Jones

Tag
VR
VL
Value Field



0010 0010 - Occupies 4 bytes. Denotes that the information is about Patient name
PN - Occupies 2 bytes. 'PN' is 'Person Name'. It denotes the data type of the value is Person name string.
0C ( Hexadecimal value) - 12 in Decimal, meaning the number of bytes for storing the value, 'Thomas Jones'

4 bytes
4 bytes
C bytes (in hex) or 12 bytes
0010 0010
PN
0C
Thomas Jones

Tag
VL
Value Field


Note that the VR is missing in the above figure.
Value Length occupies 4-bytes here. So, in an implicit VR file, VL occupies 4-bytes!

How Do I determine whether the File is an Explicit VR or Implicit VR?

This is an important information you should know before parsing the file.
This information is stored in (0002,0010) Transfer Syntax tag.
Note: Group 2 elements are always Explicit.

How Do I determine whether the File is a Little Endian or Big Endian?

This information is stored in the same (0002,0010) Transfer Syntax tag.
Note: Group 2 elements are always in EXPLICIT VR LITTLE ENDIAN.

I would like to find out the patient name information from the dataset. How do I go about it?

First determine the Transfer syntax.
Search for the tag '0010 0010' inside the file. This tag is a 4-byte hex decimal number. You will need to search this tag through file to arrive at this tag. After arriving at the tag, find out the value length and get the value.

How well does my encryption compare to the DICOM PART 14 - SECURITY PROFILES DOCUMENT?

The Attributes listed in the "Security Profile Document" typically need to be protected to provide a complete level of confidentiality from identification. My program protects 19 of the attributes, 14 of which match those recommended below.

Security Profile Document


Here are the 5 extra I also protect.

Study Date (0008,0020)
Series Date (0008,0021)
Acquisition Date (0008,0022)
Image Date (0008,0023)
Modality (0008,0060)


Medical Imaging at McMaster and Henderson Hospitals

Currently many of the medical images taken are stored as you see to the right. But a transition to entirely digital is quickly taking shape


Below is an ultrasound viewed on the computer terminal. It is stored in a digital format and no soft copy is ever produced. The most common formats are dicom, bmp, avi and jpeg.


Back