Help
Basics
1. Is it OK to have sensitive materials in the corpus?
Yes, it is okay to have some sensitive material in the corpus. The material used to generate teaching material and teaching material complements, such as presentation decks, is not used for any other purposes such as model training. However, just as good info sec practice, care should be taken to exclude any personally identifiable sensitive information such from the corpus or models (eg. names are fine, medical records are not). Most companies dealing with highly sensitive information use private models, but CorpusKey uses a mix of public and private models. The natural language processing providers for CorpusKey do not train future models on the material used for generating materials with CorpusKey.
2. Will the language models explain things not in the corpus?
The models will primarily rely on information from the created corpus to answer questions. If the information is not included in the corpus but requested in the learning objective, the language models will look outside the corpus and may not address the learning objective accurately. However, it is unlikely that the models will provide incorrect information or hallucinate details. Like all large language models, a learning objective in a module outline which asks a specific fact such as citing case law will generate a convincing looking, but likely wrong, outcome. They may occasionally look outside the corpus for additional information, but will not contradict what is represented in the corpus.
3. What if I already have a good textbook?
If you already have a good textbook but it does not come with additional resources such as question banks or PowerPoint slides, you can still benefit from using CorpusKey. You can use the text of your book to generate questions, whether they are multiple-choice or short answer questions, for your course. It is not necessary for instructors to have created an entire corpus or student readings for this to work. If you have a particular book that you like to use and have access to its text, you can use that text to generate teching complements such as mini-cases or other interactive activities for your students.
4. What is a teaching module?
A teaching module is a structured unit of instructional material designed to facilitate learning in a specific subject or topic. It typically includes learning objectives, content, activities, assessments, and resources. Teaching modules are often used by educators to deliver consistent and organized instruction to students. They can be created from scratch or customized to meet the specific needs of a course or curriculum.
5. What is a mini-case?
A mini-case is a summary of a news or current event and its relationship to course materials, along with a summary of a concept from the course material. It consists of four slides: a title slide the first slide shows the URL to the current event, the second slide presents the concept and its summary, the third slide provides a summary of the event itself, and the fourth slide explains how the event relates to the course material. Mini-cases can be created quickly and are often used to demonstrate the relevance of course readings to current events. Instructors can use any available text, such as a preferred book, to generate these mini-cases.
6. What is needed for a mini-case?
For a mini-case, you will typically need a corpus, a URL to news or current event that can be web-scraped, and a concept name from the book/corpus. For some more common and generally agreed upon concepts, a corpus may not be necessary.
Copyrights
1. How do I handle copyrights?
When it comes to handling copyrights, it is important to respect and adhere to copyright laws which often allow for derivative products. Instructors typically have access to educational materials that may be covered under educational creative commons licensing. As long as the instructor is not trying to undermine the revenue stream intended for the copyright owner and is transforming the material sufficiently for educational purposes, they usually have the right to use such material. However, it is the responsibility of the instructor to ensure that they are not violating copyright laws. Instructors are trained to know when they need to properly cite material and understand elements of copyright law and free use. Most instructors fully respect copyright protection.
2. What if I want to put copyrighted material in my corpus?
If you want to include copyrighted material in your corpus, it is important to handle copyrights responsibly and ethically. Instructors using CorpusKey should not violate copyright laws. Typically, instructors have a set of readings or research papers they wish to use for the corpus. If the intent is to synthesize this information for students, it is generally acceptable. However, instructors should be aware of the boundaries of educational use under Creative Commons or copyright owners. It is the responsibility of instructors or users to fully understand the rights and responsibilities of copyright law. Instructors cannot use CorpusKey to subvert copyrighted material written by other instructors and published for a fee. CorpusKey does not provide citations, but when material is used to answer a specific learning objective, the source material is listed under a 'further reading' section. Instructors should edit this section to ensure proper reference to the material. CorpusKey has no responsibility for the material entered into the corpus by an instructor. It is expected that instructors have obtained copyright permission and/or will sufficiently transform the material to comply with the copyright permissions they have.
3. What is the Further Reading section?
Further readings refer to additional sources of information that can provide more in-depth knowledge or different perspectives on a particular topic. Materials are listed in the Further Reading section when they are present in the corpus and relevant to the material generated by the instructor. The use of Further Readings by the instructore are optional but are default in the material generated by the instructor. Please note that these are referenced by file name used by the instructor. To assist in the Further Readings reference section for the student readings, instructors may wish to use a file naming convention such as author_year or author_title_year when adding material to the corpus. This might reduce the amount of editing necessary to make a more useful further readings section
Corpus Development
1. What is a corpus?
A corpus is a curated collection of written or transcribed texts that are used as a basis for your teaching modules. It is what the instructor feels is high quality material and serves as the reference standard for content generation. It serves as a body of material from which students should learn and draw information; rather than relying on general information freely available which may lack specificity or simply be inappropriate for instruction.
2. What can be included in the corpus?
The corpus can include various types of text, such as articles, books, research papers, websites, blogs, forum discussions, social media posts, and any other relevant written materials. It can be as narrow or broad as the learning objectives for the course.
3. Can images go into the corpus?
Images can be included in the corpus but is used within the text for illustration, figures, or tables. Due to technology limitations while some limited OCR is possible, insights from figures and tables must be made with the corpus separately. Images in the corpus are inserted into the document and presentation decks by a simple reference of file name in the outline.
4. How will I know if a document can be read?
While .txt and .doc files do not typically have readibility issues, PDF's are formed in a variety of ways including scans of images. Once you drop a document into CorpusKey, the system will work in the background to determine the readability of each item in the document. You will receive a readability score for each item, which will indicate if it can be read or not. A low readability score suggests that there are parts of the document that cannot be translated into text. Most texts have characters which cannot be read and a good readility score is anywhere from 80-95%. Instructors should not be concerned with lower scores and should simply inspect the document as the important part of the text may be readable where things like tables in landscape mode can cause issues with chracter recognition. If there is an error in machine readability that renders a document fully unusable, you will be notified in the workflow. It's important to note that CorpusKey is not designed to handle images, so image handling must be done separately.
5. How will I know when a corpus is complete?
You can determine if a corpus is complete by examining the material produced when you create a learning objective. If the material drawn from the corpus does not address the requested explanation or is off-topic, it indicates that the corpus may be incomplete. However, it is also important to consider whether the learning objective itself has been correctly specified. Before concluding that the corpus is deficient, you should first check if the learning objective can be modified to better align with the desired results.
Get Started
1. How do I get started?
To get started, you will first need build your corpus of materials and define your learning objectives for the course. Once you have a clear understanding of what you want your students to achieve, you can begin building your teaching modules. It is often helpful to run the teaching modules and review the output to ensure that your learning objectives are being addressed. If necessary, you can adjust the learning objectives or modify the corpus to align with your desired outcomes. After the first review, you may want to add more specificity to the outline. Once you have the material defined, you can work on developing complementary resources and synthesizing the material in a consistent voice.
2. How do I build a corpus?
To build a corpus, you would follow these steps:
- Gather relevant files: Collect the documents or texts that you think it relevant for your teaching modules. These files can be in various formats, such as text documents (.txt, .doc) or PDFs.
- Drop the files into the corpus folder. This folder will serve as the storage location for your corpus. Instructors can build multiple corpi for different topics or purposes. Each corpus is independent but tied to a project.
- The system will check file compatibility to ensure that all the files you have added can be read and processed. If any file is corrupt or protected and cannot be read, the system will notify you that it cannot be read.
After the files are successfully added, you can proceed to build the outlines for your teaching modules. To assist in the Furhter Readings reference section for the student readings, instructors may wish to use a file naming convention such as author_year or author_title_year.
3. When will I need to change the corpus?
You will need to change the corpus when the body of knowledge for a specific topic changes. This ensures that the course material and teaching modules stay up to date with the latest information. Changing the corpus allows for significant enhancements in creating course materials for fast-moving topical areas.
4. Can I create content without a corpus?
Yes, though creating content without a corpus is not recommended. A corpus, such as CorpusKey, is essential for generating accurate and relevant information but it depends on the uniformity of particular topic within the general body of knowledge the large language model has digested. The provides a way to bound the body of knowledge used to address learning objectives to ensure the content is comprehensive and well-informed. While the model may occasionally look outside the corpus for additional information, the corpus remains the primary source for generating content.
5. How long does it take to generate materials?
Generating materials for the course with CorpusKey takes time. The process is not instantaneous and involves a batch process. Typically, it takes minutes to generate materials, depending on the length of the teaching module. However, for larger full-text runs with multiple teaching modules, it could take hours. Instructors will be notified by email when the material is finished.
6. How do I generate questions from my corpus?
Your teaching module outline is used for either multiple choice or short answer questions, the question builders will use the corpus and the outline. If the questions appear fairly general in nature, it is typically because the outline lacks specificity with the learning objective. For generating good, challenging questions, it is even more beneficial to add specifics to the learning objectives in order to generate more challenging questions. There is a relationship between difficult level and specificity in learning objective. Knowing this, may assist an instructor in generating questions to the right level of detail and difficulty for any specific audience.
7. Can I use the questions without editing?
It is not recommended that you use or upload the question banks without review. While numerous checks are made to generate accurate and valid questions, a few things can happen as the state of technology improves. Instructors should proof the questions to assure they are comfortable with discriminant validity of the answers. This is not dis-similar from any commercial question bank. Instructors should make sure they are comfortable with the correct answer and feedback. It is rare when our models mistake an incorrect answer for correct when there is a complete corpus and specific learning objective. However, there can be gaps where information is incomplete especially if common use of terms has any divergence with the specific field. Multiple runs from the same module outline will result in similar questions which will typically have phrasing variation.
Module Development
1. What is a teaching module?
A teaching module is analogous to a book chapter. It is a structured unit of instructional material designed to facilitate learning in a specific topic or subject. A full set of complements can be developed for any teaching module and may includes student reading content, presentation decks, and assessment tools, and student virual assiatnts. Teaching modules are often used by instructors to guide their teaching and provide a framework for students to acquire knowledge and skills in a systematic and organized manner. They can be created from scratch or adapted from existing materials, and can be updated regularly to ensure they remain current and effective.
2. How do I use further readings?
To use further readings, you can refer to the list of additional resources provided at the end of a document or teaching module. These further readings are titles of the documents that are most relevant to the teaching material created and . They can be helpful if you want to explore the topic further or find more in-depth information. Please note that these are referenced by file name used by the instructor. To assist in the Further Readings reference section for the student readings, instructors may wish to use a file naming convention such as author_year or author_title_year when adding material to the corpus. This might reduce the amount of editing necessary to make a more useful further readings section
3. How are further readings referenced?
Further readings are referenced by the file name. The file name may not be representative enough of the source material, such as the author, title, and year. If instructors choose to retain further readings, they should edit this section to provide more specific information. The further readings section represents the material used to generate the readings and is most representative of the material that addresses the learning objectives specified by the instructor. At the end of the document, the references for further readings are listed by the title of the document, which is dropped into the corpus. If the title of the document is ambiguous, instructors may relabel the section under further readings to provide clarity for students seeking additional material.
4. Why do my further readings have strange titles versus proper citations?
The further readings section in this document uses the file names uploaded into the corpus by the instructor as references instead of proper citations. This is because the file names are not representative enough of the source material, such as the full title, author, and year. If you need proper citations for the further readings, you may need to edit this section or refer to the instructor for more information.
5. What is a learning objective?
Broadly, a learning objective is a specific statement that describes what learners should be able to do or know after completing a learning experience or course. It outlines the desired outcome and sets clear expectations for what learners should achieve but does not need to be phrased in traditional taxonomy like Bloom's. Learning objectives help guide instructional design and assessment, ensuring that the content and activities align with the intended learning outcomes. For the purposes of CorpusKey, a learning objective represents something similar. However, you can think of a learning objective as a task for a language model versus a student. 'explain [concept]' can work as an objective for either.
6. How should I create a learning objective?
The simplest form for teaching objectives is 'explain [concept]' or 'describe [concept]'. Since CorpusKey does natural language processing, these types of learning objectives work well since student readings are typically explanations of parsed, complex problems. It is common to refine and adjust learning objectives as you develop your teaching module. Running an initial version and reviewing the output can help ensure your objectives align with the desired outcomes for your course.
The more specific you are, the more targeted the output. You are not limited to one statement and you can add as much detail in each learning objective textbox as you would like. For example, 'explain oligopoly, how there are usually very few competitors which may have some levels of differentiation, how having few competitors allows companies the opportunity to better coordinate pricing or to engage in tactic collusion, and how the herfindahl index ranges from 0.2 to 0.6' will get a more targeted answer than 'explain ologopoly'. If you are less specific, you may find suggested material in the output. While tight specificity allows instructors to parse topics well, sometimes we may forget items we take for granted, but should explicitly include
7. How do I change a learning objective?
Learning objective modification is straightforward as the objectives are listed in the outline view. Use the pencil icon for editing. Do not forget to save or turn on auto-save. Once you review the module output from the model, you can then make the determination on how to modify the original learning objective for that section. Common modifications might be to include or exclude certain points. Such as 'Please explain relationship of stock price to company earnings. Please include the role of how interest rates will affect stock price due to interest rate linkage to discounted cash flow'.
8. What if I don't like the output and want to change a learning objective?
If you don't like the output and want to change a learning objective, you have full flexibility to edit, re-order, delete, or add material. The experience of many faculty who have built teaching modules from scratch suggests that they often need to adjust the learning objectives to include or exclude certain topics. It is recommended to run the module first and review the output to ensure that the desired learning objectives are being achieved. Once you have identified any necessary adjustments, you can modify the learning objectives accordingly. Keep in mind that natural language processing may generate different answers for the same learning objective, but the changes will likely represent a similar style and range of explanation using different words.
What we do
1. What if I have a book but I don't have a question bank?
If you have a book but it doesn't come with a question bank, you can still benefit from using CorpusKey. You can use the text of the book to generate your own questions, whether they are multiple-choice or short answer questions, for your course. Even if your book is complete but lacks additional materials like question banks or PowerPoint slides, you can still make use of CorpusKey to enhance your teaching and student experience.
2. What does CorpusKey do?
CorpusKey is a company that utilizes a combination of public and private models to provide natural language processing services. We generate teaching materials and complementary resources, such as presentation decks, using proprietary methods. We do not train future models on the material used for generating teaching materials.
3. What types of things can CorpusKey produce?
CorpusKey is designed to generate student readings by teaching module (chapter) along with all the necessary complements such as question banks and slide decks. We can develop question banks linking your text and cases you have done during the semester. CorpusKey can produce live mini-cases which are integration of current events and even full length cases. We have rolling releases of assessment tools such as short and long form answer feedback and grading. As of this writing, only the teaching module material is live
4. Is CorpusKey only used for student readings?
No, CorpusKey is not only used for student readings. While it was designed to develop course materials when there was no suitable textbook, it can also be used for corporate training materials, as a grading assistant, generating test banks, presentation decks, cases, and more. It can also be used as a complement to existing textbooks.
5. What is produced for a mini-case?
It consists of four slides: a title slide the first slide shows the URL to the current event, the second slide presents the concept and its summary, the third slide provides a summary of the event itself, and the fourth slide explains how the event relates to the course material. Mini-cases can be created quickly and are often used to demonstrate the relevance of course readings to current events. Instructors can use any available text, such as a preferred book, to generate these mini-cases..