FL WG,
This email is a follow-up from today's meeting.
To recap the discussion from today: We'll need to decide on a schema that
we can use to reference when labeling data/content. The first step in
schema development is identifying the content that we are going to be
interacting with. Once we have identified the content, then we can choose
which labels are important to that content. Those aggregated labels and
their data types then become the schema.
I think a great example of this, that the FL working group might be
familiar with, is the coco data format (
https://cocodataset.org/#format-data).
Another example of a very comprehensive schema is the schema for the
semantic web (
https://schema.org/docs/full.html).
I don't know much about the content types that the FL initiative will be
handling, but some examples off the top of my head are:
1. Models - How would we refer to whole models, subsets of those models,
and supersets thereof? This seems important to model selection, but it also
seems like it would be important to make correlations between models and
attributes of datasets that model has been trained on or otherwise
interacted with.
2. Ancillary content to models - dependencies, cross-validations, and other
contextual information pertinent to the model
3. Datasets - Coco provides a good reference here. This might just come
down to deciding on how we would refer to various mime-types of the dataset
content.
Hopefully this is a good intro into the concept and gets the discussion
started.
We'll need a project workspace where we can aggregate member input and
start schema construction.
@Heiko Ludwig <hludwig(a)us.ibm.com>, how would you like us to get started?
--
Alex Flom
(they/them)
Principal Product Security Engineer
Red Hat <
https://www.redhat.com/>
Fort Collins, CO
aflom(a)redhat.com <jflowers(a)redhat.com>
M: 970.443.0191
<
https://red.ht/sig>[image: Impact Tilt - Change Catalyst]