By Diganta Bose, BE, MS (Bioinformatics), Clinical Programmer
The Idea
Writing SAS programs for SDTM conversion could be a cumbersome process when the raw data are not CDASH compliant or is a legacy data that did not follow any particular standard for data capture (in CRF and during Data Entry into a designed DBMS screen). This task could be simplified to a great extent if clinical programmers have utility macros (substitution of programming lines) handy with them. Let us see how and why utility macros may be used.
- Utility macros Fine Tunes Programming Tasks
- Utility macros can be used for Avoiding Repetitive Programming Tasks
- Major objective is to reduce a moderately major programming task to just filling in (passing) parameters to a function (or a macro) that will do the clinical programming for you.
- To develop a utility program, we need to understand and analyze the variation of the data that the program is supposed to handle.
The idea is to study extensively how a SDTM variable (or a group of similar SDTM variables) in question is captured across studies. This should span all possibilities in which the variable(s) are defined. A clinical programmer can then create the variable(s) in a way that all possible algorithms or rules are taken into consideration when creating the variable(s). This program can then turn into a macro with parameters which can be called in to do the programming for you whenever you need to create SDTM variable(s). Imagine the ease of a regular SAS programmer’s job when prewritten executable programs are available, in the form of macros, for 60-80% of SDTM programming tasks. If such macros are developed, it can save a lot of programming time during stringent timelines.
Some Implementation Examples
Before a clinical programmer starts to develop a utility macro, he or she should first understand how the SDTM variable(s) in question has to be derived using a rule or a simple assignment. Let us consider some of the SDTM timing variables (AESTDTC, AEENDTC, LBDTC, and so on) that capture dates and times from raw data in ISO8601 format. A utility macro can be developed which considers all possible date and time formats as inputs, including partial dates and times that can be captured in raw data, and converts to ISO8601 format. This utility macro can be called in each time a clinical programmer wants to create DTC variables across all applicable SDTM domains and across all studies.
Similarly let us consider the variables which haves numeric coded formats such as AESER in AE domain and SEX in DM domain. For instance the variable AESER may have codes 1=Mild, 2=Moderate and 3=Severe or codes has codes 0=Mild, 1=Moderate and 2=Severe and variable SEX may have codes 1=M and 2=F or codes 0=M and 1=F. A utility macro can be developed so each of the numeric coded formats can be passed as parameters and the values can be controlled when a clinical programmer runs the macro.
There are numerous opportunities to program utility macros for all or at least most SDTM variables that follow similar rules or algorithms. Thus time can be saved because repetitive clinical programming tasks are avoided.
Conclusion: Reusability and Flexibility
A utility program, provided it takes into consideration the broad spectrum of data, rules, and algorithms it needs to handle as input, and generate the desired output, should not only be flexible and reusable but also should be robust to work under multiple conditions. Such macros should be stress tested thoroughly before releasing for production. Robust and powerful utility macros can make the SDTM mapping process very flexible and quick. Enough documentation should be provided so that any clinical programmer, other than the developer, can use the macros by understanding the parameters, functioning and so on. Implementing updates should be quick; the clinical programmer just has to make minor updates to the utility macro and run it rather than going to each line of individual programs and manually programming the updates. In this way tedious clinical programming tasks can be reduced to defining macro parameters for this kind of “Modular Programming” approach.
Macros are useful to define “canned” code that may be made available to other users in an organization. But, there are also simple, less sophisticated macros that are useful in your day to day coding efforts, if only to reduce keystrokes. We often find ourselves reaching for the same, familiar tools as we deal with the typical coding and data analysis tasks common to many of our occupations. While robust, parameterized macros may be overkill for much of what we do, there’s enough repeatable coding activity taking place that are smaller; utility macros can save us development time, typing effort and reduce errors.
References
Paper SBC-125 Quick ‘n Dirty - Small, Useful Utility Macros Harry Droogendyk,
Stratia Consulting Inc., Lynden, ON
*Note that the above article talks about feasibility of macros in the SAS language and not macros in general.
SAS and all other SAS Institute Inc. product or service names are registered trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
By James Zuazo, MS, Justin Sjogren, MS, and Christopher Hurley, MBA, Clinical Programming & Biostatistics
When Study Data Tabulation Models (SDTM) datasets are part of a submission, the Center for Drug Evaluation and Research (CDER) performs a series of checks to ensure data integrity. When reviewing the SDTM datasets, CDER observes issues that are repeated across various studies and submissions. To make certain these common issues are addressed before submitting SDTM datasets, let’s review them and ensure they don’t happen to us. The points raised below are in no particular order.
1. MedDRA Terms:
CDER has noticed MedDRA terms are not always properly implemented by the sponsor. Defining System Organ Class (SOC) is a common area where MedDRA terms are not accurately used. The SOC terms are often longer than both Higher Level and Preferred Terms and more prone to misspellings. Merging MedDRA related SDTM datasets with the desired MedDRA version will ensure that all misspellings are caught.
2. “SDTM Like” Datasets:
It is tempting to try to turn an SDTM dataset into an analysis dataset. However, this is not an approved CDER practice. SDTM datasets should not include any imputed data. When there is a need to impute data, both analysis datasets (ADaM) and supporting documentation should be created. For traceability, SDTM datasets should be part of an ADaM submission. CDER likes to see the progression from the collected data to SDTM to ADaM.
3. File Size:
The size of a dataset should be considered before submitting to CDER. While nothing can be done about the number of observations in the datasets, one effective way to reduce dataset size is to minimize variable lengths. The practice of setting variable lengths to an arbitrary maximum length of 200 is not a CDER preferred method. Instead, variable lengths should be set according to the maximum observed variable length and adhere to the suggested SDTM Implementation Guide lengths. As an example, let’s review the common SDTM variable DOMAIN. Since this variable can only contains two characters abbreviations like AE or DM, setting a length of 2 would be appropriate for DOMAIN.
When a dataset reaches or exceeds a size of 1 GB, CDER would prefer splitting the dataset into smaller datasets. The rules for properly splitting datasets are outlined in the SDTM Implementation Guide v3.1.2 under section 4.1.1.7. Before a dataset is split, make sure to set appropriate variable lengths. For additional information about splitting file CDER can be contacted at eData@fda.hss.gov.
4. Required SDTM Variables:
CDISC describes each variable mentioned in the SDTM guidelines as either Required (Req), Expected (Exp) or Permissible (Perm). The three categories are called Core variables. A Required variable identifies and provides meaning to a given record. For example USUBJID, MHTERM and EGTESTCD are required variables. Of the three variable types, Required variables are mandatory and should be fully populated. However, CDER has found this rule isn’t always followed.
5. Subject Identifier:
A unique identifier should be given to each trial participant (subject). The subject identifier USUBJID should be assigned in such a way that it is unique across all studies. The use of adding leading or trailing spaces to create independent subject IDs is not an approved CDER practice. This method cases machine matching issues. Creating a unique subject identifier that is not done properly can result in request to resubmission of the data. Section 4.1.2.3 of the Implementation Guide v3.1.2, gives a clear description of how to accurate create a unique subject identifier. For a given study, merging all subject based domains with DM will ensure that a subject’s USUBJID value is consistent. Another option would be to run Open CDISC Validator (OCV).
6. Standard Units:
A standard unit is either selected by a Sponsor or predetermined by an established convention. Examples of such conventions are Lab SI units and CDISC controlled terminology. In general when there is a standard result or finding value, a unit (--STRESU) is assigned. CDER has found that the standard unit that is used is not always consistent, which is an issue. A specific standard unit should be used for a given test. For example, let’s consider pulse. The standard unit is beats per minute, but there are multiple ways to indicate this: BPM, BEATS/MIN, or spelled out. For this test, only one of these units should be used. Whether following or setting a standard unit, it is important to do so consistently.
7. Date Variables:
SDTM date variables should follow an ISO 8601 format. This format is described in detail in section 4.1.4 of the SDTM Implementation Guide v3.1.2. The guide also provides instruction on how to handle incomplete date and times. A CDER suggestion is to only include time when it is collected on the CRF.
CDER frequently encounters start dates that are greater than the end dates. This issue is commonly found in the concomitant medication (CM) and adverse event (AE) domains and is often caused by not implementing the ISO 8601 format correctly. When the collected data is source of the issue, the inconsistency should be documented and placed into a reviewers guide.
Over time, industry experience and CDER’s review of submitted CDISC compliant datasets will change. CDER will eventually generate a new list of commonly found issues. As a CRO, it is our responsibility to submit quality datasets to ensure that we are not guilty of any items on this list.
By Justin Sjogren, MS, Clinical BioStatistician/Programer
[Note: In August 2011, one of my former Statistics professors at GVSU asked me if MMS would be interested in having a booth at GVSU Statistics Career Day, and if I would be willing to give a talk. My presentation focused around my job as a statistician/programmer in the pharmaceutical industry and was geared towards undergraduate and graduate students who were interested in learning more about what a career in statistics is like from the perspective of a former student. The presentation is summarized below.]
Statisticians play an important role in many phases of the clinical trial process, beginning at the design stage and progressing through the final analysis. Statisticians keep the big picture in mind and key strengths include their ability to ask good questions, or ask why a particular decision was made and how best to ensure that each question is worthwhile and useful. They provide input early on in trial development about things such as the type of trial design, randomization considerations, and sample size calculations, which identify the number of subjects needed for a successful trial. All these items help to save the sponsor in time and cost.
Another key responsibility of the statistician is to write the Statistical Analysis Plan (SAP). This regulatory document tells specifically how the analyses will be performed and what will be reported in the final tables. This is where the statistician can think more deeply about how to answer the research question through the appropriate analyses.
During the clinical trial and as subject data starts being made available, clinical programmers will begin producing draft summary tables and statisticians will begin looking into the draft output to ensure everything is in order. In a blinded clinical trial, the treatment group assignments remain unknown until trial completion, so in the datasets, programmers assign subjects to ‘dummy’ treatment groups, which allows them to produce tables that will look just like the final output, only the treatments that the subjects are assigned to are not real. This allows for the study team to have a look at the tables and provide comments prior to the final analysis.
Some clinical trials will have interim analyses, which is a special type of analysis done after only a percentage of the subjects have been enrolled. Often times, early in drug development, the sponsor may be curious about which dose levels are most effective and most well-tolerated, so they will assess this at some point (maybe after 50% of the subjects have been enrolled) during the trial rather than waiting until all subjects have been enrolled. This allows the sponsor to drop doses or re-allocate to certain dose groups if any safety or tolerability issues are seen. This can be tricky however, because the trial is still ongoing and the core study team needs to remain blinded, so an independent group is often utilized to help aid in these decisions. For example, an independent (separate from the sponsor company) statistician will typically be identified to ‘unblind’ the study and perform the required unblinded analyses. Often times the statistician will submit the results to a panel of experts (called a data monitoring committee) to review and decide if or what study modifications may be needed. These may be due in some way to a particular safety concern that may even lead to trial termination or suspension. These decisions are communicated to the sponsor’s upper management.
In a more traditional clinical trial, when all subjects have completed, the database is locked, the study is unblinded, and statisticians and programmers are charged with creating the final tables, listings and graphs. If the analysis calls for any inferential statistics (such as models or statistical tests), the statisticians will typically create these tables. These tables are sent to the medical writers for incorporation into the clinical study report. Then, statisticians are available to the medical writers for any technical questions regarding the tables
Statisticians can have a wide variety of duties and responsibilities, but below are some common traits a good statistician will possess:
- Clear and concise writer – there are no style points when writing SAPs, but clear writing is still very important in providing a how-to for another statistician or in providing statistical rationale for statisticians and non-statisticians alike
- Understanding of statistical tools such as SAS® – having this skill is extremely helpful for investigating issues, completing analysis and validation, etc.
- Interpretation of statistical concepts to non-statistical audiences
- Learn from experiences – build off of what has and has not worked in the past
- Strong attention to detail
- Thoroughness
THE CHALLENGE: Project Delays Threaten On-Time Delivery of SAP and Final Study Results
A large pharmaceutical company and their development partner were finalizing the Statistical Analysis Plan (SAP) and preparing for final reporting in order to present at an upcoming shareholder’s meeting. With the presentation date quickly approaching, however, their progress was slow due to delays in finalizing the database and questions related to the quality of the SAP. Concerned they would be unable to compile the necessary data and complete the appropriate analysis on time, they reached out to MMS for help.
MMS SOLUTION: Rapid Response Team Deployed for Additional Expert Assistance
With the client quickly running out of time to meet their deadline, MMS assigned a rapid response team to immediately begin work to revise and finalize the SAP. Part of MMS’s Adaptive Parallel Processing system, the team consisted of highly skilled and experienced experts in the various areas related to the project deliverables. After analyzing rate limiting steps and correcting inaccurate and inconsistent data in the previous analyses, the MMS rapid response team provided a completely revised SAP within two weeks, programming and validation within four weeks, and a top line report of trial results within three days of database lock.
THE OUTCOME: SAP Completed / Trial Results Available / Client Prepared
According to the Sponsor, MMS delivered all the necessary data and reports in a fraction of the time it would have taken their internal resources and their prior collaborating CRO to do the same task. As a result, the pharmaceutical company and their development partner had their SAP in hand and their trial results ready to go well in advance of the shareholder’s meeting.
THE CHALLENGE: Budgetary Issues Bring Phase 2b Study to a Halt
To secure additional funding for continued business operations, a small pharmaceutical company was required to submit a Phase 2b study. However, during development of the study, the company encountered budgetary issues that forced them to discontinue the project. Making matters worse, their clinical research partner declined to participate in any further business activities without a significant operating budget. To get the study back on track, the company turned to MMS with only a few months left to meet their deadline.
MMS SOLUTION: Complete the Study Within the Remaining Time and Budget
Working with the limited funds available, MMS stepped in and immediately prioritized the project deliverables to maximize cost savings and to focus on the most relevant and impactful components of the study. We prepared key data displays and developed concise statistical reports. Instead of assembling as many tables as possible, we saved time and money by identifying only the tables that provided answers to the most pressing questions and cleaned only essential data. It was an exercise in efficient data mining and reporting and, when it was completed, we provided the client with additional biostatistical information to include in their messaging for the review board.
THE OUTCOME: Phase 2b Study Completed and Delivered Successfully
MMS, armed with experience from similar clinical trials and submissions data, was able to deliver the study on time and within the client’s budget. As a result, the company was successful in securing the venture capital funding required for future operations.
By Christopher Hurley, MBA, Justin Sjogren, MS, and James Zuazo, MS, Clinical Programming & Biostatistics
Ingredients: Raw Data (CDASH® brand if possible), CRFs, Protocol, SDTM Implementation Guide, WebSDM (or Open CDISC Validator if you’re on a tight budget), Client Input and 2 to 4 Programmers/Statisticians (4 to 6 for quicker results!).
Directions:
- Begin with a finely grated selection of CDISC SDTM implementation guidelines (for extra spicy, choose version 3.1.2!)
- Set up trial domains and sprinkle with a dash of consistency to other studies if integration is desired.

- Mix raw data, CRFs, Protocol and Sponsor input together and blend purposefully to create Specifications that capture the flavor of the study design and data that was collected.
- Assign programming teams to the mix and develop SAS-based SDTM data domains utilizing the savory Specifications from above.
- For best results, validate these domain datasets using double-programming. This is the preferred method and will enhance your stew with a hearty and satisfying flavor of quality.
- Reconcile all differences between the source and validation programming of the SDTM domains. Note: allow a little extra time for this step.
- Season the SDTM data domains with CDISC compliance using a pinch of Open CDISC Validator or WebSDM, and update your data domains based on the findings.
- Garnish your SDTM domain datasets with updates based on sponsor review comments. Repeat until a perfect blend is achieved then serve with confidence. Note: this is the ultimate taste test. If you slacked off in Steps 2-7, it will show here. Enjoy your Stew!
SDTM stew serves well on its own or as an appetizer to the analysis! Once the raw data is in SDTM format, the process of creating analysis datasets (ADaM) and/or tables will be a mouth-watering treat for your programming and statistics team! Bon appetite!
By Christopher Hurley, MBA, Manager, Clinical Programming & Biostatistics
The Theory of Constraints (TOC) is a unique management philosophy that strives for a rationale or scientific approach to management. It provides a way to simplify the complexity of human-based systems and still keep the main issues and impacts under managerial control.
- TOC was initially developed for manufacturing but can be used in any industry
- Compatible with Six-Sigma, Lean and other methodologies
- Provides methodology to develop our own common sense solutions, based on our own circumstances and understanding, not something “canned”
Key Concepts
- Every organization has a goal to achieve. In clinical programming, our goal is to maximize our ability to fulfill requirements
- An organization is more than the sum of its parts. Achievement of goal depends on synchronization of parts or people in a combined effort
- The performance of an organization is constrained by very few variables. Every part or person producing at 100% all of the time is impossible because of interdependencies and timing
Physical Constraints – such as hours in a day, network bandwidth, and licenses
Policy and Paradigm Constraints – such as system access, SOPs, and archaic rules
There are Five Focusing Steps to Break Constraints

Step 1 - Identify the system’s constraint(s)
What don’t we have enough of? Is there any part of the organization that is waiting for something? Is the constraint where it should be?
Step 2 - Decide how to exploit the systems constraints
How do we get the most with what we’ve got? What does the part of the organization do that others can do? If the constraint can be immediately removed without large investments, do it now and go back to Step 1.
Step 3 - Subordinate everything else to the decisions made in Step 1 and 2.
Make sure everyone in the organization is aligned with exploitation decisions
Step 4 – Elevate the system’s constraints
Evaluate alternatives and then execute the way you have chosen to elevate the constraints
Step 5 – Don’t allow inertia to be the system’s constraint
The constraint may have moved. Go back to Step 1 and find it.
Analyses of Change
Every improvement is a change but not every change is an improvement. TOC analysis enables us to develop solutions to the right problems using the following types of analyses.
- What to change? Current Reality Tree Analysis
- To what to change? Future Reality Tree Analysis
- How to cause the change? Transition Tree Analysis
- Cause and effect logic is used in each analysis. Each of the aforementioned trees looks like a flowchart showing all of the consequences from each condition or action.
- Each analysis is a blend of common sense, intuition and purposeful action
At MMS we have implemented TOC analyses to determine root causes for various issues and then developed solutions to resolve those issues. TOC is an excellent methodology to employ for quickly developing solutions to the right problems. TOC allows us to develop well thought out plans and initiatives to overcome the obstacles standing in the way of our goals and objectives.
How has the Theory of Constraints affected your workplace?
Reference:
"The Theory of Constraints Way to Overcome Resistance to Change,"- Low, J. Proceedings of the American Production and Inventory Control Society (APICS) International Conference and Exposition, Las Vegas, October 6-9, 2003.
http://www.busadm.wayne.edu/profile.php?id=51
By Diganta Bose, BE, MS, Clinical Programmer
Introduction
The completion of the Human Genome Project (2003) and Bioinformatics research has created an opportunity for a significant rise in a new breed of data in both research and clinical care. This new kind of data, generated as a result of Pharmacogenomics (PGx) research, promises understanding of molecular pathways and underlying disease risks in populations at a more appropriate, quantitative and qualitative way.
The PGx team, a sub-team within the CDISC SDS Team (Clinical Data Interchange Standards Consortium-Study Data Submission), has developed several domains designed to carry Pharmacogenomics data. The development of these CDISC PGx domains was done in parallel with the work being done by the HL7 (Health Level Seven, an authority that sets standards in Information Technology for Healthcare Research) Clinical Genomics Work Group (CG), which was initiated jointly several years ago by CDISC and HL7. This creates new opportunities and challenges for the SAS programmers working on CDISC complaint data structures.
Pharmacogenomics - the science and the data
Pharmacogenomics (PGx) is a branch of pharmacology which explains the relationship between genetic variations and drug response in patients by correlating gene expression or SNPs (Single Nucleotide Polymorphism) with a drug’s efficacy and toxicity. Such studies can help to develop rational means to optimize drug therapy, with respect to the patients' genotype, to ensure maximum efficacy with minimal adverse effects. Such approaches promise the advent of "personalized medicine"; in which drugs and drug combinations are optimized for each individual's unique genetic makeup.
Pharmacogenomics information is helpful particularly when it comes to cancer trials; pharmacogenomics tests are used to identify which patient will have toxicity from commonly used cancer drugs and identify which patient will not respond to commonly used cancer drug. The tests most commonly include gene expression analysis using microarrays, which can be performed on specific tissues specimens collected from the patients. Other tests may include SNP or any genetic polymorphism analysis, genotyping and a few more. The data one would expect out of such PGx tests are mainly biospecimen and genetic (DNA, RNA) samples along with their date and time captures including clinical significance information.
Handling, manipulating and analyzing the data - A SAS Programmer’s challenge
The PGx Findings domain stores key results such as intensity values (both raw and normalized), P-Value, fold-change, ratio, genetic change, amino-acid change, etc. Such data are significantly different from the currently available findings module (Lab data, Vital Signs, Pharmacokinetic data, etc.).
The HL7 CG has developed a Genetic Variation model in conjunction with clinical care participants such as Partners Healthcare and Intermountain Healthcare who are leading the adoption of PGx in healthcare. As part of the HL7 work, LOINC (Logical Observation Identifiers Names and Codes) was extended to include the most commonly used genetic variation tests. CDISC plans to create vocabulary for CDISC TESTCD and TEST which will reside in the NCI EVS (National Cancer Institute’s Enterprise Vocabulary Services) and be a counterpart to the LOINC codes. The NCI is currently working with the group that originally developed the Microarray and Gene Expression (MGED) standards, to validate and populate the Ontologies for Biomedical Investigators (OBI) into EVS. This ontology will be used for the Gene Expression data by both CDISC and HL7.
The initial package contains the following domains: BS-Biospecimen, BE-Biospecimen Event, ES-Extracted sample, PG-Pharmacogenomics, PF-Pharmacogenomics Findings.
A genetic variation data could be anything from complex arrangements of strings that looks like random character strings (generally A T G (U) C in case of a DNA/RNA sequence information like a gene substring) to strings or number arranged in an ambiguous array which actually holds some hidden meaning that need to be decoded again by some complex algorithm which a programmer has to implement. Unlike the other finding variables, which contains derived or assigned values that are simple to understand letters or text, numeric values and discrete values (Yes/No, 0/1, Male/Female), the variables in the PGx domain may not be necessarily simple and a programmer may need to think of rules and very specific programming to derive certain variables from the captured raw data in order to make them CDISC data structure complaint. Programming may involve developing macros to automate standardized algorithms across the domains and major operations effective to handle complex strings such as using regular expressions and string functions. Mapping codes likely will involve databases like NCBI (National Center for Biotechnology Information) and GenBank along with medical dictionaries like MeddRA and WHODRUG. All these new kind of data collected in PGx domains brings new opportunities for more flexible SDTM (Study Data Tabulation Model) and ADaM (Analysis Data Model) programming for CDISC.
For details read the CDISC Pharmacogenomics news article available on the CDISC website at http://www.cdisc.org/pgx-review-article
MMS Holdings Inc. today announced a financial gift to aid the relief efforts in the areas of Japan affected by the March 11, 2011 devastating 9.0 magnitude earthquake and tsunami. There are thousands of individuals in need in Japan and MMS is one of the many organizations that have stepped up to lend a hand.
MMS Holdings funds will go to the American Red Cross that will use this contribution towards the much-needed healthcare services and medical supply needs of the local victims in Japan.
“Our thoughts go out to our colleagues and friends in Japan along with their families” said MMS Vice President, Prasad Koppolu. “We as an organization hope that our support of the American Red Cross greatly helps a number of Japanese people as well as those involved in the recovery efforts.”
MMS Holdings is a niche pharmaceutical service organization currently partnering with and has long-standing relationships with a number of Japan-based pharmaceutical companies. MMS colleagues hope for the quickest possible recovery for those affected communities.

About MMS Holdings Inc.
MMS Holdings Inc. is based in Canton, MI and is a highly experienced pharmaceutical service organization that is focused on quality deliverables in the areas of Clinical Programming, Biostatistics, Medical /Regulatory Writing and Comprehensive Pharmacovigilance. Commitment to Quality deliverables with robust submission experience sets MMS apart from traditional service providers. MMS Holdings Inc. is ISO-9001 certified for all services and maintains detailed quality metrics for every project. For more information visit: http://www.mmsholdings.com