FL WG,

This email is a follow-up from today's meeting.

To recap the discussion from today: We'll need to decide on a schema that we can use to reference when labeling data/content. The first step in schema development is identifying the content that we are going to be interacting with. Once we have identified the content, then we can choose which labels are important to that content. Those aggregated labels and their data types then become the schema.

I think a great example of this, that the FL working group might be familiar with, is the coco data format (https://cocodataset.org/#format-data). Another example of a very comprehensive schema is the schema for the semantic web (https://schema.org/docs/full.html).

I don't know much about the content types that the FL initiative will be handling, but some examples off the top of my head are:

1. Models - How would we refer to whole models, subsets of those models, and supersets thereof? This seems important to model selection, but it also seems like it would be important to make correlations between models and attributes of datasets that model has been trained on or otherwise interacted with.

2. Ancillary content to models - dependencies, cross-validations, and other contextual information pertinent to the model

3. Datasets - Coco provides a good reference here. This might just come down to deciding on how we would refer to various mime-types of the dataset content.

Hopefully this is a good intro into the concept and gets the discussion started.

We'll need a project workspace where we can aggregate member input and start schema construction.

@Heiko Ludwig, how would you like us to get started?

Alex Flom

(they/them)

Principal Product Security Engineer

Red Hat

Fort Collins, CO

aflom@redhat.com
M: 970.443.0191