Skip to main content

Datashim

Datashim is an LF AI Foundation incubation project licensed under the Apache 2.0 license. It enables and accelerates data access for Kubernetes/OpenShift workloads in a transparent and declarative way.

Datrashim is a Kubernetes Framework which supports easy access to S3 and NFS Datasets within pods. It orchestrates the provisioning of Persistent Volume Claims and ConfigMaps needed for each Dataset. Our Framework introduces the Dataset CRD which is a pointer to existing S3 and NFS data sources. It includes the necessary logic to map these Datasets into Persistent Volume Claims and ConfigMaps which users can reference in their pods, letting them focus on the workload development and not on configuring/mounting/tuning the data access. Thanks to Container Storage Interface it is extensible to support additional data sources in the future.

Quickstart

Once you install Datashim you can easily create and use Datasets:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "{ACCESS_KEY_ID}"
    secretAccessKey: "{SECRET_ACCESS_KEY}"
    endpoint: "{S3_SERVICE_URL}"
    bucket: "{BUCKET_NAME}"
    readonly: "true" #OPTIONAL, default is false  
    region: "" #OPTIONAL

The newly created PVC example-dataset can be easily used in your Pods for read/write data operations

GitHub

Visit our repo on Github if you want to report issues you faced while using the framework or propose new functionality you want to see. If you would like to engage more, we look forward to adding you as a contributor!

Join the Conversation

Datashim maintains three mailing lists. You are invited to join the one that best meets your interest.

Datashim-Announce (top-level milestone messages and announcements)

Datashim-TSC (top-level technical governance discussions and decisions)

Datashim-Technical-Discuss (technical discussions and questions)