Difference between revisions of "Data Structures and Algorithms for Engineers"

Latest revision as of 09:00, 22 November 2017

Course description

Many organizations today are incorporating computer hardware and software into the products that they design and build. Most of these organizations' primary competencies are not computer science or software engineering, but rather they find that automation makes their products smarter, more capable, and more appealing in the market place. Because deep domain knowledge is needed to build these products, these organizations often hire engineers from traditional engineering disciplines to design and build the product platform, in many cases requiring them to write software to make the product actually work. These are capable engineers from many disciplines other than software engineering and unfortunately they usually learn software engineering on the job. This process typically involves considerable trial and error and often results in poorly designed and documented systems, defect laden software, bloated product development costs, unmaintainable software, and missed opportunities to leverage software development investments.

In addition to developing mere functionality, some application domains are often highly constrained and unforgiving in their quality attribute needs such as performance, safety, and availability. These systems intimately depend upon software to provide these capabilities in addition to basic functionality. Designing software intensive systems with these properties in a cost-effective way requires first-class computer science and software engineering expertise. While many practicing engineers often have many years of industrial experience writing software applications, many lack a formal background in computer science principles. These engineers may have had a few courses or technical training in specific computer languages or technologies, but in general they often lack formal training in algorithms, computing theory, data structures, and design among other key topics. The result is that many of these engineers are not fully realizing their potential as software engineers. This course is designed to bridge these gaps in formal computer science training.

Learning objectives

The primary objective of the course is to provide engineers without formal training in computer science, a solid background in the key principles of computer science, in general, and of the algorithms and data-structures, in particular. The key purpose of this course is to complement the experience that engineers may already have in writing software with formal computer science underpinnings, making those engineers more capable in developing software intensive systems.

The course begins by considering the main phases of the software development lifecycle, from requirements elicitation, to computational modelling, system specification, software design, implementation, and software quality assurance, including various forms of testing, verification, and validation. Then, building on the concept of abstract data types, the course provides an in in-depth treatment of the key elements of algorithms and data-structures, beginning with the fundamentals of searching, sorting, lists, stacks, and queues, but quickly progressing to more advanced topics, including trees, graphs, and algorithmic strategies. It also covers the analysis of the performance and tractability of algorithms, finishing with automata theory and computability theory. A key focus of the course is on effective implementation and good design principles.

Outcomes

After completing this course, students should be able to:

Recognize and analyze critical computational problems, generate alternative solutions to problems, and assess their relative merits.
Understand, analyze, and characterize those factors that influence algorithmic computational performance and memory consumption.
Design, implement, and document appropriate, effective, and efficient data structures & algorithms for a variety of real-world problems.
Understand detailed software structures and their underlying strengths and weaknesses.
Perform detailed, code-level design and document the design in an understandable way.

Content details

Refer to the Lecture Schedule for information on course delivery, including lectures, labs, assignments, and exercises.

The course will cover the following topics:

Introduction: The Software Development Life Cycle
Formalisms for Representing Algorithms
Analysis of Complexity
Searching and Sorting Algorithms
Abstract Data Types (ADT)
Containers, Dictionaries, and Lists
Stacks
Queues
Trees
Heaps
Graphs
Complex Networks
Hashing
Algorithmic Strategies
Analysis of Correctness
Automata Theory
Computability Theory

The detailed content for each of these topics follows.

Introduction: The Software Development Life Cycle

Motivation
Goals of the course
Syllabus and lecture schedule
Course operation
Software development tools for assignments
Preview of course material
Levels of abstraction in information processing systems
The software development life cycle: Yourdon Structured Analysis - functional, data, and behavioural models (hierarchical decomposition trees, architecture diagrams, data flow diagrams DFD, data dictionaries, entity relationship ER diagrams, state transition diagrams)
Software process models: waterfall, evolutionary, formal transformation, re-use, hybrid, spiral

Formalisms for Representing Algorithms

Definition of an algorithm
Modelling software
Relational modelling
State modelling
Practical representations
Pseudo code
Flow charts
Finite state machines
UML
Predicate logic
Analysis

Analysis of Complexity

Performance of algorithms, time and space tradeoff, worst case and average case performance
Big O notation
Recurrence relationships
Analysis of complexity of iterative and recursive algorithms
Recursive vs. iterative algorithms: runtime memory implications
Complexity theory: tractable vs intractable algorithmic complexity
Example intractable problems: travelling salesman problem, Hamiltonian circuit, 3-colour problem, SAT, cliques
Determinism and non-determinism
P, NP, and NP-Complete classes of algorithm

Searching and Sorting Algorithms

Linear and binary search (iterative and recursive)
In-place sorts: bubblesort (efficient and inefficient), selection sort, insertion sort.
Not-in-place sorts: Quicksort, merge sort.
Complexity analysis
Characteristics of a good sort
Speed, consistency, keys, memory usage, length & code complexity, stability
Other sorts ordered by complexity

Abstract Data Types (ADT)

Vector example exercise
History of abstraction
Abstract Data Types (ADT)
Information hiding
Types and typing
Encapsulation
Efficiency
Design practices

Containers, Dictionaries, and Lists

Basic operations
Implementation with arrays and linked lists
Singly linked lists
Doubly linked lists
Performance considerations

Stacks

Stack (LIFO): push, pop, peek, size, numItems operations
Array implementation (directly and array of pointers to data)
Stack applications, including evaluation of infix, prefix, and postfix expressions

Queues

Queue (FIFO): enqueue, dequeue, peek, size, numItems operations
Array implementation (directly and array of pointers to data)
Linked list implementation
Circular queues
Performance considerations

Trees

Concepts and terminology: level, height, external and internal nodes, skinny, fat, complete, left-complete, perfect, multi-way, d-ary
Types of tree: binary, binary search, B-tree, 2-3 tree, AVL, Red-Black
Binary trees and binary search trees
Tree traversals: inorder, preorder, postorder
Fixed-length codes, variable length codes, optimal code trees, Huffman's algorithm and implementation
Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations
Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring

Heaps

Heap basics
Types of heap: min heaps and max heap
Heap characteristics
Heap operations: delete max/min, down heap, up heap, merge, construct, heapify; complexity of operations
Priority queues
Operating systems heaps
Implementation of heap
Heap sort

Graphs

Types of graphs: directed, undirected, weighted, unweighted, cyclic, acyclic, directed acyclic, simple, non-simple, implicit, explicit, embedded, topological
Adjacency matrix representation,
Adjacency list representation
Topological sort
Euler's theorem
Graph traversal: breadth-first and depth-first, uses of
Depth-first search and maze traversal
Spanning trees and minimum spanning trees, Kruskal's algorithm, Prim's algorithm
Dijkstra's shortest path algorithm
Floyd-Warshall's all-pairs algorithm

Complex Networks

The importance of complex networks and network science
Euler's theorem: the Bridges of Königsberg
Networks vs. graphs
Degree, average degree, and degree distribution
Bipartite networks
Path length, BFS, Connectivity, Components
Clustering coefficient
Random graph model
Small world phenomena
Scale free networks
Communities
Fundamental Hypothesis
Connectedness and Density Hypothesis
Strong and weak communities
Graph partitioning
Community detection
Hierarchical clustering
Girvan-Newman Algorithm
Modularity
Random Hypothesis
Maximum Modularity Hypothesis
Greedy algorithm for community detection by maximizing modularity
Overlapping communities
Clique percolation algorithm and CFinder

Hashing

Using keys to address data
Mappings: injection, surjection, bijection
Map ADT
Hash functions
Hash tables: current value tables, direct access tables
Managing collisions: chaining, overflow areas, re-hashing, linear probing, quadratic probing
Example application: dictionaries

Algorithmic Strategies

Classes of algorithms
Brute force
Divide and conquer
Greedy algorithms
Dynamic programming
Combinatorial search and backtracking
Branch and bound

Analysis of Correctness

Types of software defects
Code module design
Syntactic, semantic, logical defects
(Semi-)formal verification: partial vs. total correctness
Invariant assertion method
Simple proof strategies: by contradiction, counterexample, induction
Dynamic testing: unit tests, test harness, stubs, drivers, integration testing, regression testing.
Static tests: reviews, walkthroughs, inspections, reviewing algorithms and software
Pair programming
Verification and validation strategies

Automata Theory

Regular Languages
Finite Automata
Nondeterminism
Regular Expressions
Nonregular Languages
Context-free Languages
Context-free Grammars
Pushdown Automata
Deterministic Context-Free Languages

Computability Theory

The Church-Turing Thesis
Turing Machines
Variants of Turing Machines
The Definition of Algorithm
Decidability
Decidable Languages
Undecidability
Reducibility

Lecture Schedule

Refer to the Lecture Schedule for information on course delivery, including lectures, labs, assignments, and exercises.

Faculty

David Vernon

Delivery

Face-to-face.

Student assessment

This course includes several hands-on programming and analysis assignments. Students will program in C/C++. The programming assignments include seven individual assignments. In addition to programming assignments, students will be assigned readings to support the lecture material.

Marks will be awarded as follows.

Best six of seven individual assignments: 60%. Mid-semester examination: 10%. Final examination: 30%.

Software tools

We will use Microsoft Visual C++ Express compiler, version 10.0 (also known as Visual C++ 2010) and CMake running on Windows 7 (64 bit).

Please follow the instructions provided in the Software Development Environment installation guide.

Course texts

David Harel and Yishai Feldman, Algorithmics: The Spirit of Computing, Third Edition.

Alfred V. Aho, Jeffrey D. Ullman, and John E. Hopcroft, Data Structures and Algorithms.

A selection of examples will be taken from Steven Skiena, "The Algorithm Design Manual", Second Edition.

A selection of papers and readings will be provided to complement these required textbooks.

Acknowledgments

This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University, USA, Course CS-CO-412 Algorithms and Data Structures given by David Vernon at Innopolis University, Russia, and Course IT706A Scientific Theory in Informatics given by David Vernon and others at the University of Skövde, Sweden.

@@ Line 6: / Line 6: @@
 <span style="color:#AB0000" ><span style="font-size:18px">Data Structures and Algorithms for Engineers</span></span>
-<span style="color:#AB0000">Course discipline: TBD</span>
+<!-- <span style="color:#AB0000">Course discipline: TBD</span> -->
 <span style="color:#AB0000">Core</span>
@@ Line 64: / Line 63: @@
 * Algorithmic Strategies
 * Analysis of Correctness
-* Databases
+<!-- * Databases -->
-* Programming Paradigms
+<!-- * Programming Paradigms -->
 * Automata Theory
 * Computability Theory
@@ Line 95: / Line 94: @@
 * Predicate logic
 * Analysis
 <span style="color:#AB0000">Analysis of Complexity</span><BR>
@@ Line 145: / Line 143: @@
 * Circular queues
 * Performance considerations
-* Deque
+<!-- * Deque -->
 <span style="color:#AB0000">Trees</span><BR>
@@ Line 155: / Line 153: @@
 * Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations
 * Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring
-* Non-search trees: parse trees, array implementation, linked list implementation
+<!-- * Non-search trees: parse trees, array implementation, linked list implementation -->
-* Forests
+<!-- * Forests -->
 <span style="color:#AB0000">Heaps</span><BR>
@@ Line 167: / Line 165: @@
 * Implementation of heap
 * Heap sort
-* d-ary heaps
+<!-- * d-ary heaps -->
-* Leftist  heaps
+<!-- * Leftist  heaps -->
 <span style="color:#AB0000">Graphs</span><BR>
@@ Line 181: / Line 179: @@
 * Dijkstra's shortest path algorithm
 * Floyd-Warshall's all-pairs algorithm
-* Graphs problems: routes, Hamilton paths, network flows, covering problems, museum guard problem
+<!-- * Graphs problems: routes, Hamilton paths, network flows, covering problems, museum guard problem -->
-* Fleury's Euler circuit algorithm
+<!-- * Fleury's Euler circuit algorithm -->
 <span style="color:#AB0000">Complex Networks</span><BR>
-* Complex systems and large networks
+<!-- * Complex systems and large networks
-* Random networks
+* Community detection
-* Degree distribution
+* Network evolution -->
-* Clustering
+* The importance of complex networks and network science
+* Euler's theorem: the Bridges of Königsberg
+* Networks vs. graphs
+* Degree, average degree, and degree distribution
+* Bipartite networks
+* Path length, BFS, Connectivity, Components
+* Clustering coefficient
+* Random graph model
 * Small world phenomena
 * Scale free networks
+* Communities
+* Fundamental Hypothesis
+* Connectedness and Density Hypothesis
+* Strong and weak communities
+* Graph partitioning
 * Community detection
-* Network evolution
+* Hierarchical clustering
+* Girvan-Newman Algorithm
+* Modularity
+* Random Hypothesis
+* Maximum Modularity Hypothesis
+* Greedy algorithm for community detection by maximizing modularity
+* Overlapping communities
+* Clique percolation algorithm and CFinder
 <span style="color:#AB0000">Hashing</span><BR>
@@ Line 201: / Line 218: @@
 * Hash tables: current value tables, direct access tables
 * Managing collisions: chaining, overflow areas, re-hashing, linear probing, quadratic probing
-* Evaluating hash functions: prime division, mid-square, folding, load factor
+<!-- * Evaluating hash functions: prime division, mid-square, folding, load factor -->
 * Example application: dictionaries
-* Generating hash functions and using hash structures
+<!-- * Generating hash functions and using hash structures -->
 <span style="color:#AB0000">Algorithmic Strategies</span><BR>
@@ Line 213: / Line 230: @@
 * Combinatorial search and backtracking
 * Branch and bound
-* Heuristics and heuristic algorithms
+<!-- * Heuristics and heuristic algorithms -->
-* Probabilistic algorithms
+<!-- * Probabilistic algorithms -->
 <span style="color:#AB0000">Analysis of Correctness</span><BR>
@@ Line 228: / Line 245: @@
 * Verification and validation strategies
+<!--
 <span style="color:#AB0000">Databases</span><BR>
 * Databases: relational databases, hierarchical databases, NoSQL databases
@@ Line 233: / Line 251: @@
 * Compression strategies, dictionary algorithm, LZ algorithm
 * File structure strategies.
+-->
+<!--
 <span style="color:#AB0000">Programming Paradigms</span><BR>
 * Imperative programming
@@ Line 241: / Line 260: @@
 * OO design methodology: UML class diagram, composite-structure diagram, architecture diagram, package diagram, object diagram, component diagram, deployment diagram, activity diagram, sequence diagram, communication diagram, interaction diagram, timing diagram, use case diagram, state machine diagram
 * OO design principles: open/close principle, design by contract principle, dependency inversion principle, other design principles, documentation
+-->
 <span style="color:#AB0000">Automata Theory</span><BR>
 * Regular Languages
@@ Line 273: / Line 292: @@
 ==<span style="color:#AB0000">Student assessment</span> ==
-This course includes several hands-on programming and analysis assignments. Students will program mainly in C/C++. The programming assignments include individual assignments and a team capstone project in teams of 2-3 people. In addition to programming assignments, students will be assigned readings to support the lecture material.
+This course includes several hands-on programming and analysis assignments. Students will program in C/C++. The programming assignments include seven individual assignments. In addition to programming assignments, students will be assigned readings to support the lecture material.
 Marks will be awarded as follows.
-Individual Assignments 70%
+Best six of seven individual assignments: 60%.
-Final Capstone Project  30% (The capstone project will be completed in 2-3 person teams).
+Mid-semester examination: 10%.
+Final examination: 30%.
 <!-- Instructor Judgement 10% (We will use time tracking and observation to determine this part of the grade). -->
@@ Line 297: / Line 317: @@
 ==<span style="color:#AB0000">Acknowledgments</span> ==
-This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University.
+This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University, USA, Course CS-CO-412 Algorithms and Data Structures given by David Vernon at Innopolis University, Russia, and Course IT706A Scientific Theory in Informatics given by David Vernon and others at the University of Skövde, Sweden.
-Additional topics and teaching material have been taken from Course CS-CO-412 Algorithms and Data Structures given by David Vernon at the Innopolis University.

Difference between revisions of "Data Structures and Algorithms for Engineers"

Latest revision as of 09:00, 22 November 2017

Contents

Course description

Learning objectives

Outcomes

Content details

Lecture Schedule

Faculty

Delivery

Student assessment

Software tools

Course texts

Acknowledgments

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools