improved README and fixed many things

This commit is contained in:
Régis Behmo 2021-06-08 16:23:33 +02:00
parent e8d3008dea
commit cb99cb9242
17 changed files with 962 additions and 867 deletions

View File

@ -1,46 +1,101 @@
Tutor Cairn: scalable, real-time analytics for Open edX
========================================================
Cairn: scalable, real-time analytics for Open edX
==================================================
TODO: Sweet readme
Analytics are an essential component of an online learning platform: you need to know whether your courses are effective and which parts need some improvement. You need to know if your students are falling by, and if they do, you need to detect the early warning signs. When your courses are successful, you want to get periodical engagement reports.
We created a tool to help you answer all these questions. Cairn is a Tutor plugin that you install on top of an Open edX platform and that gives you access to a powerful, full-blown analytics stack. Cairn comes with the following features out of the box:
🖴 **Unified datalake of learner events and stateful data**: both learner-triggered events, coming from the Open edX tracking logs, and stateful data, coming from the existing databases, are available for querying in a single unified interface. This means that you can, for instance, query the grades of the students that visited your platform in the past 24 hours, or collect the email addresses of the students who did not yet complete the latest assignment.
⚡ **Real-time:** new events are visible immediately in your analytics interface. No more waiting for slow batch jobs to complete!
🔑 **Course- and org-based data access rights:** your course staff is granted access only to the data rows that concern them. Cairn makes it easy to create new users with granular access permissions.
🎁 **Working dashboards out of the box:** Cairn comes with a fully functional dashboard that you can start playing with right away.
🛠️ **Fully customizable data and dashboards:** your data scientists, business intelligence team and other tinkerers can freely explore your course data, create and share their own queries, datasets and dashboards. All it takes is a little bit of SQL.
🚀 **Scalable:** Cairn scales as much as its backend, which was designed for Internet scale.
Cairn vs alternatives
---------------------
========================================== ===== =================================================================================== ===================================================
List of features Cairn `Open edX Insights <https://edx.readthedocs.io/projects/edx-insights/en/latest/>`__ `Figures <https://github.com/appsembler/figures>`__
========================================== ===== =================================================================================== ===================================================
Event aggregation ✅ ✅ ❌
Real-time data ✅ ❌ ✅
Easy to install ✅ ❌ ✅
Custom queries and dashboards ✅ ❌ ❌
Works with the latest Open edX versions ✅ ✅ ❌
========================================== ===== =================================================================================== ===================================================
How does Cairn work?
--------------------
Cairn uses the same collect/store/expose paradigm made popular by other frameworks such as the `ELK Stack <https://www.elastic.co/fr/elastic-stack>`__ -- excepts that all the components are different and better suited to Open edX:
- On the server side, tracking logs are collected by `Vector <https://vector.dev/>`__, an efficient, cloud-native log collector.
- Tracking log events are then stored in a `Clickhouse <https://clickhouse.tech/>`__ table, which is the cornerstone of Cairn. Clickhouse also exposes MySQL data via live and materialized views. This is the magic piece of the puzzle which allows us to join event and MySQL data.
- The data inside Clickhouse is made visible to the end-users in a `Superset <https://superset.apache.org/>`__ frontend.
Installation
------------
::
Cairn requires a `Tutor Wizard Edition license <https://overhang.io/tutor/wizardedition>`__. Once you have enabled your license, installing the plugin is as simple as running::
tutor license install tutor-cairn
Usage
-----
::
Getting started
~~~~~~~~~~~~~~~
Enable the plugin with::
tutor plugins enable cairn
Then, restart your platform and run the initialization scripts::
tutor local quickstart
Create credentials to access the Clickhouse database::
tutor local run cairn-clickhouse cairn createuser YOURUSERNAME
Create an admin user to access the frontend::
# You will be prompted for a new password
tutor local run cairn-superset superset fab create-admin --username yourusername --email user@example.com
tutor local run cairn-superset cairn createuser --admin YOURUSERNAME YOURUSERNAME@YOUREMAIL.COM
You can then access the frontend with the user credentials you just created. Open http(s)://data.<YOUR_LMS_HOST> in your browser. When running locally, this will be http://data.local.overhang.io. The admin user will automatically be granted access to the "openedx" database in Superset and will be able to query all tables.
Management
----------
To import the "Course overview" dashboard that comes with Cairn, run::
tutor local run cairn-superset cairn bootstrap-dashboards YOURUSERNAME /app/bootstrap/courseoverview.json
Some event data will be missing from your dashboards: just start using your LMS and refresh your dashboard. The new events should appear immediately.
.. image:: https://overhang.io/static/catalog/screenshots/cairn.png
:alt: Alpine cairn
Data-based access control
~~~~~~~~~~~~~~~~~~~~~~~~~
Most of your users should probably not have access to all data from all courses. To restrict a given user to one or more courses or organizations, select the course IDs and/or organization IDS to which the user should have access and create a user with limited access to the datalake::
tutor local run cairn-clickhouse cairn createuser --course-id='course-v1:edX+DemoX+Demo_Course' --org-id='edX' yourusername
tutor local run cairn-clickhouse cairn createuser --course-id='course-v1:edX+DemoX+Demo_Course' --org-id='edX' YOURUSERNAME
Then, create the corresponding user on the frontend::
Then, create the corresponding user on the frontend with the same command as above (but without the ``--admin`` option)::
tutor local run cairn-superset cairn createuser yourusername yourusername@youremail.com
tutor local run cairn-superset cairn createuser YOURUSERNAME YOURUSERNAME@YOUREMAIL.COM
Your frontend user will automatically be associated to the datalake database you created, provided they share the same name.
Your frontend user will automatically be associated to the datalake database you created.
Cairn comes with a convenient pre-built dashboard that you can add to any user account::
tutor local run cairn-superset cairn bootstrap-dashboards yourusername /app/bootstrap/courseoverview.json
Refreshing course block data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Course block IDs and names are loaded from the Open edX modulestore into the datalake. After making changes to your course, you might want to refresh the course structure stored in the datalake. To do so, run::
@ -53,18 +108,20 @@ Or, if you want to avoid running the full plugin initialization::
-v $(tutor config printroot)/env/plugins/cairn/apps/clickhouse/auth.json:/openedx/clickhouse-auth.json \
lms python /openedx/scripts/importcoursedata.py
Running on Kubernetes
~~~~~~~~~~~~~~~~~~~~~
When running on Kubernetes instead of locally, most commands above can be re-written with `tutor k8s exec service "command"` instead of `tutor local run service command`. For instance::
# Privileved user creation
tutor k8s exec cairn-superset "superset fab create-admin --username yourusername --email user@example.com"
tutor k8s exec cairn-superset "superset fab create-admin --username YOURUSERNAME --email user@example.com"
# Unprivileged user creation
tutor k8s exec cairn-clickhouse "cairn createuser --course-id='course-v1:edX+DemoX+Demo_Course' --org-id='edX' yourusername"
tutor k8s exec cairn-superset "cairn createuser yourusername yourusername@youremail.com"
tutor k8s exec cairn-clickhouse "cairn createuser --course-id='course-v1:edX+DemoX+Demo_Course' --org-id='edX' YOURUSERNAME"
tutor k8s exec cairn-superset "cairn createuser YOURUSERNAME YOURUSERNAME@YOUREMAIL.COM"
Development
-----------
To reload Vector configuration after changes to vector.toml, run::
tutor config save && tutor local exec cairn-vector sh -c "kill -s HUP 1"
@ -77,6 +134,13 @@ To launch a Python shell in Superset, run::
tutor local run cairn-superset superset shell
.. image:: https://overhang.io/static/catalog/img/cairn.png
:alt: Alpine cairn
Support
-------
Are you having trouble with Cairn? Do you have questions about this plugin? Please get in touch with us at contact@overhang.io. Community support is also available on the official Tutor forums: https://discuss.overhang.io
License
-------

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -39,7 +39,7 @@ setup(
packages=find_packages(exclude=["tests*"]),
include_package_data=True,
python_requires=">=3.5",
install_requires=["tutor-openedx"],
install_requires=["tutor-openedx>=11.0.0,<12.0.0"],
entry_points={"tutor.plugin.v0": ["cairn = tutorcairn.plugin"]},
classifiers=[
"Development Status :: 3 - Alpha",

View File

@ -85,14 +85,19 @@ spec:
- name: SYSFS_ROOT
value: /host/sys
volumeMounts:
- name: data
mountPath: /var/lib/vector
- name: var-log
mountPath: /var/log/
readOnly: true
- mountPath: /etc/vector/vector.toml
name: config
subPath: vector.toml
subPath: k8s.toml
readOnly: true
volumes:
- name: data
persistentVolumeClaim:
claimName: cairn-vector
- name: var-log
hostPath:
path: /var/log/

View File

@ -1,3 +1,17 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cairn-vector
labels:
app.kubernetes.io/component: volume
app.kubernetes.io/name: cairn-vector
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
{% if CAIRN_RUN_CLICKHOUSE %}
---
apiVersion: v1

View File

@ -1,6 +1,6 @@
- name: cairn-vector-config
files:
- plugins/cairn/apps/vector/vector.toml
- plugins/cairn/apps/vector/k8s.toml
- name: cairn-clickhouse-user-config
files:
- plugins/cairn/apps/clickhouse/users.d/cairn.xml

View File

@ -4,7 +4,8 @@
cairn-vector:
image: docker.io/timberio/vector:0.13.X-alpine
volumes:
- ../plugins/cairn/apps/vector/vector.toml:/etc/vector/vector.toml:ro
- ../../data/cairn/vector:/var/lib/vector
- ../plugins/cairn/apps/vector/local.toml:/etc/vector/vector.toml:ro
{% if CAIRN_DOCKER_HOST_SOCK_PATH %}- {{ CAIRN_DOCKER_HOST_SOCK_PATH }}:/var/run/docker.sock:ro{% endif %}
environment:
- DOCKER_HOST=/var/run/docker.sock
@ -52,7 +53,7 @@ cairn-superset-worker-beat:
- cairn-redis
- cairn-postgresql
cairn-redis:
image: docker.io/redis:5.0-alpine
image: docker.io/redis:6.2.4-alpine
restart: unless-stopped
{% if CAIRN_RUN_POSTGRESQL %}
cairn-postgresql:

View File

@ -38,7 +38,7 @@ def import_course(course_key):
course = get_course(course_key, depth=None)
print("======================", course_id, course.display_name)
values = [
sql_query(
sql_format(
"('{}', '{}', '{}', '{}', '{}', '{}')",
course_id,
str(child.location),
@ -54,12 +54,12 @@ def import_course(course_key):
f"Inserting {len(values)} items in course_blocks for course '{course_id}'..."
)
make_query(
sql_query(
sql_format(
"ALTER TABLE course_blocks DELETE WHERE course_id = '{}';",
course_id,
),
)
insert_query = sql_query(
insert_query = sql_format(
"INSERT INTO course_blocks (course_id, block_key, block_id, position, display_name, full_name) VALUES "
)
insert_query += ", ".join(values)
@ -74,7 +74,7 @@ def iter_course_blocks(item, prefix=""):
yield from iter_course_blocks(child, prefix=prefix)
def sql_query(template, *args, **kwargs):
def sql_format(template, *args, **kwargs):
args = [sql_escape_string(arg).decode() for arg in args]
kwargs = {key: sql_escape_string(value).decode() for key, value in kwargs.items()}
return template.format(*args, **kwargs)

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,12 @@
{% include "cairn/apps/vector/partials/common-pre.toml" %}
### Sources
# Capture logs from kubernetes
[sources.kubernetes_logs]
type = "kubernetes_logs"
[transforms.openedx_containers]
type = "filter"
inputs = ["kubernetes_logs"]
condition = '.kubernetes.pod_namespace == "{{ K8S_NAMESPACE }}" && includes(["lms", "cms"], .kubernetes.container_name)'
{% include "cairn/apps/vector/partials/common-post.toml" %}

View File

@ -0,0 +1,12 @@
{% include "cairn/apps/vector/partials/common-pre.toml" %}
### Sources
# Capture logs from all docker containers
[sources.docker_logs]
type = "docker_logs"
[transforms.openedx_containers]
type = "filter"
inputs = ["docker_logs"]
condition = 'includes(["lms", "cms"], .label."com.docker.compose.service")'
{% include "cairn/apps/vector/partials/common-post.toml" %}

View File

@ -1,32 +1,9 @@
# Vector's API for introspection
[api]
enabled = true
address = "127.0.0.1:8686"
### Sources
# Capture logs from all containers
[sources.docker_logs]
type = "docker_logs"
[sources.kubernetes_logs]
type = "kubernetes_logs"
### Transforms
# Select lms & cms containers
[transforms.openedx_docker_containers]
type = "filter"
inputs = ["docker_logs"]
condition = 'includes(["lms", "cms"], .label."com.docker.compose.service")'
[transforms.openedx_kubernetes_containers]
type = "filter"
inputs = ["docker_logs", "kubernetes_logs"]
condition = '.kubernetes.pod_namespace == "{{ K8S_NAMESPACE }}" && includes(["lms", "cms"], .kubernetes.container_name)'
# Parse tracking logs: extract time
[transforms.tracking]
type = "remap"
inputs = ["openedx_docker_containers", "openedx_kubernetes_containers"]
inputs = ["openedx_containers"]
# Time formats: https://docs.rs/chrono/0.4.19/chrono/format/strftime/index.html#specifiers
source = '''
parsed, err_regex = parse_regex(.message, r'^.* \[tracking\] [^{}]* (?P<tracking_message>\{.*\})$')
@ -62,10 +39,9 @@ source = '''
# Log all events to stdout, for debugging
[sinks.out]
type = "console"
inputs = ["openedx_kubernetes_containers"]
# inputs = ["tracking_debug"]
inputs = ["tracking_debug"]
encoding.codec = "json"
# encoding.only_fields = ["time", "message.context.course_id", "message.context.user_id", "message.name"]
encoding.only_fields = ["time", "message.context.course_id", "message.context.user_id", "message.name"]
# # Send logs to clickhouse
[sinks.clickhouse]
@ -78,4 +54,4 @@ database = "{{ CAIRN_CLICKHOUSE_DATABASE }}"
table = "_tracking"
healthcheck = true
{{ patch("cairn-vector-toml") }}
{{ patch("cairn-vector-common-toml") }}

View File

@ -0,0 +1,6 @@
data_dir = "/var/lib/vector/"
# Vector's API for introspection
[api]
enabled = true
address = "127.0.0.1:8686"

View File

@ -1,7 +1,7 @@
# Superset image with additional database drivers
# https://hub.docker.com/r/apache/superset
# https://superset.apache.org/docs/databases/installing-database-drivers
FROM docker.io/apache/superset:a9d888ad402ebb35da45df446997c426d6abee9d
FROM docker.io/apache/superset:0e07a5ca03cb2a6f560b77847c13413b9a8c7d97
USER root
# https://pypi.org/project/clickhouse-driver/

View File

@ -40,6 +40,11 @@ def main():
" Defaults to the username."
),
)
parser_user.add_argument(
"--admin",
action="store_true",
help=("Make the user an administrator."),
)
parser_user.add_argument(
"-r",
"--role",