PDF | A summary of the basic principles of statistics. Both the Bayesian and Fre- quentist points of view are exposed. In the Frequentist. These lecture notes have been used at Basics of Statistics course held in Uni- versity of Johnson, R.A. & Bhattacharyya, G.K., Statistics: Principles and. Principles of Business Statistics PDF generated: August 8, Descriptive Statistics: Skewness and the Mean, Median, and Mode.
|Language:||English, Spanish, Indonesian|
|Genre:||Science & Research|
|Distribution:||Free* [*Sign up for free]|
Basic Principles of Statistics. February , 1 Decisions, Loss, and Risk. The basic idea in inferential statistics is to take an action based on a decision. There are two main branches of statistics: descriptive and inferential. Descrip- tive statistics is used to say something about a set of information that has been. Statistics—the subject of data analysis and data-based . The sixth edition of STATISTICS—Principles and Methods maintains the objec-.
A data breach or disclosure breach happens when data is released that identifies a person, household, business, or organisation. You must acknowledge that there is always a risk of a data breach happening.
How can we use perturbation to protect confidentiality?
Perturbation — adding random noise to data — is a widely used data confidentiality method. Perturbation works by adding a random value to the data, to mask the data. Perturbation is a best-practice method. Use a coordinated approach to count and magnitude tables A count measures the number of individuals whose confidentiality is being protected.
A count magnitude or value magnitude measures a sum of counts or sum of values relating to the individual data you are protecting. For instance: the human population in an area is a count how many television sets a population owns is a count magnitude how much they earn is a value magnitude. Also: the number of businesses in an area is a count how many employees they employ is a count magnitude how much profit they make is a value magnitude.
The ACS includes software, applications, and expertise to help users automatically apply confidentiality methods and produce consistent results. In the NCM method, each individual data record is assigned a uniformly distributed random number. These random numbers are fixed across time, to ensure the same degree of perturbation is applied to the individual over time. How to perturb counts For count tables, random numbers generate a new random number for units grouped together in a cell.
This is the basis for fixed random rounding to base 3 FRR3.
It ensures the same group of individuals will always be rounded the same way in related tables. In FRR3, you randomly round counts to base 3. Counts that are already multiples of three are left unchanged.
Those not a multiple of three, you round to the nearest multiple of three two-thirds of the time, or the next nearest multiple of three one-third of the time. This is to disguise small counts. But since all table data are rounded consistently, they are protected against both: differencing attacks, where closely related results might be subtracted from each other to discover underlying small counts Monte Carlo attacks, where attacks are run again and again, to discover the underlying raw numbers based on the distributions of results.
We can protect information in counts tables by random rounding to base 3 RR3. The counts are randomly rounded to base three in a consistent manner. This is to disguise small counts, but all cells in the table are randomly rounded.
The effect is to make the output more confidential, by generally preventing individuals' data from being released. How does this affect the final data? For small numbers, where there is the most risk that individuals could be identified, there are larger percentage changes compared with larger numbers.
For example, a cell with a one changed to a three has been changed by percent, but a cell with 1, changed to 1, has been changed by only 0. When analysing data, small counts need to be treated with caution but for the larger values the percentage changes in these cells do not cause a problem.
The noise protects sensitive data where there is a disclosure risk but cancels itself out in larger collections of data. How can we use aggregation to protect confidentiality? Aggregation involves grouping categories together.
You avoid disclosure by combining columns or rows into one new group. You combine or simplify data outputs. This reduces the amount of data available about individuals. Striking a balance between releasing data and saving labour In the long run, aggregation is effective for striking a balance between releasing as much data as possible and limiting the work involved in producing tables.
Aggregation is useful when there are many cells with small numbers. By collapsing categories or combining data cells, you remove much of the sensitivity in the table. You need subject matter knowledge to use this method. You need to know which values in the data are important for your data users, and how values have been aggregated in the past, so you can apply aggregation consistently.
Aggregation lowers the amount of detail in the final output data. You need to ensure that the resulting dataset is still useful for your users. Good data classifications and standards make aggregation easier To maximise flexibility, code data at the lowest level of the classification possible.
Classifications and standards should: have an underlying conceptual basis fit within a statistical framework which is intuitive and easy to understand, navigate, and apply be internationally comparable when you need to compare data across countries be stable and comparable over time balanced with the need to update classifications from time to time. Classifications and standards must be systematic and operationally feasible.
If the size of the residual group grows considerably, you need to revise the classification system to minimise bias in the data, use automated processes and methods, such as coding tools where practical ensure classifications are hierarchical, with a main group level which you break down further into lower classification levels. Use a common collapsing strategy for aggregations. Give classifications names that reflect both the most detailed and the collapsed levels.
How can we use suppression to protect data confidentiality? When you suppress data, you do not report selected data. Suppression is removing data from an output that reveals individualised information. Suppress data by not reporting some data outputs If a data value reveals too much data about a person, household, or business, you can remove the data value from the output by suppressing. This is primary suppression.
The main strategies used are strong recruitment; professional and career development programs in all core professional groups; maintaining programs that build a positive, exciting and healthy workplace; introducing programs to stimulate creativity and innovation, particularly at the "grass roots" level; and participating actively, nationally and internationally, in professional communities engaged in official statistics.
Statistics Canada continues to ensure that the workforce is professional, motivated and innovative; a strong focus on its workforce is a recognized hallmark of the agency.
Data quality To ensure that the most appropriate methods and procedures are being used, the agency has developed and implemented a series of governing instruments to guide the many statistical processes within the organization.
At the highest level, the agency has developed a Quality Assurance Framework, which provides analysts with the definition of data quality and with standards by which to measure it. Within this framework, the quality of information is defined in terms of a multi-dimensional concept that embraces both the relevance of information to users' needs and the characteristics of this information, such as accuracy, timeliness, accessibility, interpretability and coherence.
Transparency about these various dimensions helps users judge the extent to which a statistical product is fit for a specific purpose. A significant feature of quality management, which is highlighted in the framework, is balancing quality objectives against the constraints of financial and human resources, the goodwill of respondents in providing source data, and the competing demands for greater quantities of information.
Another way in which quality considerations are embedded into the practices of the organization is through the agency's Management Committee on Methods and Standards.
This committee provides advice and guidance on developing and applying statistical standards, approving and adopting statistical concepts, developing and using sound statistical methods, and setting priorities for statistical research and innovation.
In addition, Statistics Canada has an external Advisory Committee on Statistical Methods, which advises the Chief Statistician on the use of efficient statistical methods in the agency's programs, and on the agency's program of research and development in statistical methods.
The committee's members are experts from private industry and academia. Internally, Statistics Canada has a Quality Secretariat dedicated to supporting the development and implementation of policies and procedures that promote sound quality management practices; to designing and managing studies related to quality management; and to providing advice and assistance to program areas on quality management.
Communicating about data quality In addition to applying rigorous quality-assurance mechanisms in order to provide data users with reliable statistical information, the agency is also responsible for informing users about data quality.
The Policy on Informing Users of Data Quality and Methodology requires that all statistical products include or refer to documentation on data quality and methodology.
These standards and guidelines describe the kind of documentation that is expected for each data release. This policy also requires that, for each statistical program, users be provided with the information necessary to understand both the strengths and limitations of the data being disseminated. Documentation on methodology must permit users to assess whether the data adequately approximate what they wish to measure, and whether the data were produced within the tolerances accepted for their intended purpose.
Extensive documentation on quality, concepts and methodology, and other explanatory information, are also made available via Statistics Canada's Integrated Metadatabase on its website.
NSOs continuously aim to introduce and maintain methodological improvements into concepts, methods and procedures to improve official statistics. Statistics Canada is responsible for informing users of the concepts and methodology used in collecting, processing and analyzing its data; of the accuracy of these data; and of any other features that affect their quality, or "fitness for use.
Commentary in The Daily, and in associated materials, focuses on the primary messages that the new information contains. Directed particularly at the media, such commentary increases the likelihood that the first level of interpretation to the public will be clear and correct and increases the likelihood that mass media will integrate the material in its output, thus making it visible to vast audiences.
Such mass dissemination is a key ingredient in ensuring the visibility and credibility of an official statistics agency. The Policy on Highlights of Publications requires that all statistical publications contain a section that highlights the principal findings in the publication. Statistics Canada's standards and guidelines for the provision of metadata derive from the Policy on Informing Users of Data Quality and Methodology.
The policy lays out requirements and guidelines on how to provideinformation on data quality and methodology with every statistical product. The Integrated Metadatabase is the repository used to store this information for each survey, in addition to other related metadata. Statistics Canada participates in international committees to be fully aware of the current standards in the dissemination of metadata. The agency's experts also write scientific papers on methods, present them to the public, and make them available for use by the public.
Principle 4: The statistical agencies are entitled to comment on erroneous interpretation and misuse of statistics. Statistics can be used and interpreted in many different ways, and they may also be used for advertising and political purposes.
It is then important for NSOs to maintain trust and credibility by drawing attention to obvious public incorrect use or interpretation. Statistics Canada's Directive on Media Relations sets forth the guidelines for responding to erroneous statements in news reports. When news coverage in print or online contains erroneous statements about Statistics Canada and its programs or policies, or misinterpretations of data, communications staff send a formal response to the media to request that the information be updated online, or that a correction be issued in the newspaper.
Statistics Canada carries out activities to educate users, including the media, on how to use data. One example is "concept brief" presentations for the Census of Population and the Census of Agriculture, which are posted on the website in advance of the data being released. While data or findings are not discussed, members of the media receive a briefing on census concepts prior to release day so that they can readily understand the data being released.
A second example is that media are briefed about important changes or revisions to data prior to their release. For example, when Statistics Canada revised data from the System of National Accounts, members of the media who attended lockups for economic data received a briefing.
Media outlets were also informed of the changes through a statistical announcement posted on the agency's website. The Media Relations service of Statistics Canada holds lockups for major economic indicators and census releases, during which subject-matter experts are present to answer questions related to the data released. Principle 5: Data for statistical purposes may be drawn from all types of sources, be they statistical surveys or administrative records.
When choosing the source, statistical agencies must consider quality, timeliness, costs and the burden on respondents. Producing official statistics is a costly and labour-intensive task for statistical agencies, and is demanding from a respondent perspective. Therefore, statisticians have to apply methods in the least intrusive way and choose the most cost-efficient data sources, without compromising data quality.
The Statistics Act confers substantial powers on Statistics Canada to obtain information for statistical purposes through surveys of Canadian businesses and households. By default, response to Statistics Canada's surveys is mandatory under the act; refusal to participate could be subject to legal penalty. The act includes provisions to make participation in some surveys voluntary, and Statistics Canada has generally done so for household surveys other than the Census of Population, and for the Labour Force Survey, which produces critical economic data.
Surveys of businesses, including agricultural businesses, are conducted on a mandatory basis. Statistics Canada can also, by law, access all administrative records, including tax data, customs declarations, and birth and death records. Such records are very important sources of statistical information, because they reduce response burden on business and individual respondents. When feasible, Statistics Canada uses administrative data, to reduce the burden on businesses and households.
Additional efforts in using administrative data to reduce the burden on Canadians and businesses will continue to be a focus of the agency for years to come. While legal authority is a most useful tool, the agency favours relying on collaborative partnerships to secure access to administrative data. Partnerships with other federal departments, other jurisdictions, and external organizations play a large role in reducing response burden.
Statistics Canada continues to foster these arrangements as they serve the needs of stakeholders, the national statistical system and the Canadian research community.
In tandem with these powers, the agency is charged with ensuring the confidentiality of the information in its hands, and with limiting the use of this information to statistical purposes. A fundamental requirement for official statistics is protecting confidentiality.
This requirement is expected to be strictly implemented in each and every aspect of the statistical process—from survey planning to dissemination of statistical products.
Confidentiality protection The strong power given to Statistics Canada to collect and access information is counterbalanced by a guarantee of confidentiality: all agency employees are personally liable for ensuring statistical confidentiality, and even courts cannot have access to individually identifiable statistical information without the informed consent of respondents. The most important specific tool to deal with this matter is the Statistics Act, which spells out the agency's obligations and the personal liability of all employees.
This message is reinforced through training, starting with an introductory course; physical perimeter security, which serves as a daily reminder; an especially secure computing environment that makes it physically impossible to penetratethe network, thus preventing access by potential hackers; an extremely strong cultural tradition that is passed on from generation to generation. In addition, various corporate committees have been put in place to ensure proper access and protection of individual data: the Microdata Access Management Committee, the Information Management Committee, the Communications and Dissemination Committee, and the Security Coordination Committee.
Finally, a Disclosure Control Resource Centre conducts and coordinates research for the protection of respondent confidentiality in data disseminated by Statistics Canada.
Privacy protection All statistical surveys represent a degree of privacy invasion, which is justified by the need for an alternative public good, namely information. The relevant issues are the methods used to ensure that questionnaire content is minimally intrusive, that respondents are informed of the purposes to be served by the data collection, and that the total reporting burden imposed on the population is regularly measured, controlled, and equitably distributed.
A special issue relates to the very sensitive topic of record linkage.
Given the wide scope for record linkage within a centralized statistical system, particularly that of Statistics Canada, which has broad access to the data holdings of other departments, the agency developed a multi-level review procedure, as well as extensive ongoing consultation mechanisms with stakeholder groups and the Office of the Privacy Commissioner of Canada. These mechanisms aim to ensure that all record linkage activities serve a clear public-interest purpose and that linked data will be retained only as long as operationally required.
Principle 7: The laws, regulations and measures under which the statistical systems operate are to be made public. High-quality legislation is critical to the effective performance of a national statistical system.
In addition, under the Corporations Returns Act, Statistics Canada collects financial and ownership information on corporations conducting business in Canada. One of the Chief Statistician's key responsibilities is to ensure that Statistics Canada's operations are transparent and independent of government influenceThe agency is proactive and fully transparent in disclosing its methods and standards. Under the guidelines of thePolicy on Informing Users of Data Quality and Methodology, Statistics Canada applies rigorous quality-assurance mechanisms to provide data users with reliable statistical information.
The agency is also responsible for informing users about data quality, which involves applying consistent measures to identify, record, approve, and correct post-release errors and unplanned revisions, and to report thereon. Statistics Canada keeps Canadians informed of the agency's various priorities and activities through regular reports, such as the Report on Plans and Priorities, the Departmental Performance Report, and the Corporate Business Plan.
In addition, the agency regularly communicates with the public using communication vehicles such as blogs, other social media, videos, and statistical announcements.
Principle 8: Coordination among statistical agencies within countries is essential to achievinge consistency and efficiency in the statistical system. According to the United Nations Statistics Division, General Review , "No matter what the organizational arrangements are for producing official statistics, coordination of NSO should be undertaken to avoid duplication of work, and to facilitate the integration of data from different sources through the use of statistical standards.
As part of this effort, it permits the agency to enter into two kinds of joint collection and data-sharing agreements: 1 with any government department, provided that respondents are notified and that they register no objection; and 2 with a provincial statistical agency that has legislative confidentiality protection comparable with that of Statistics Canada.
Furthermore, the Statistics Act requires that Statistics Canada coordinate the national statistical system, specifically to avoid duplication in the information collected by government.