Frequently asked questions
If the FAQ does not answer your question, feel free to contact us any time at email@example.com.
What is Weak Supervision?
Put simple, Weak Supervision is the automatic combination of imperfect and noisy labeling sources - such as labeling functions - into one labeling program. If you want to detect the urgency of a text message, you could for instance write multiple functions: One to detect whether there are any exclamation marks in your text, as this typically indicates some kind of urgency (not always, but as mentioned, these rules do not have to be perfect). Another one could be the sentiment in the text, which can be calculated using 3rd party models; urgent messages can have a more harsh tonality. Lastly, another function could be to check whether an "important" flag has been set in the mail. Once you have defined these functions, they can be combined using Weak Supervision to create one latent labeling function.
What is Active Learning?
In Active Learning, a Machine Learning model is trained as data is being labeled to help the annotating person concentrate his/her efforts on the harder examples that the algorithm cannot process reliably on its own. There are several strategies when applying Active Learning. One example is the prioritization of records that the trained model is the least confident on.
How can I write Labeling Functions?
Theoretically, any function returning a classification based on some input can be seen as a Labeling Function. We currently provide an interface to declare labeling functions using our Python SDK. Check out our documentation for more information.
I am not sure if my labeling function is correct.
We provide several metrics that help you figure out whether your labeling function is useful. The most important thing to remember is the following: As long as your labeling function performs above chance, it can’t be too wrong!
You say you use models in the labeling process. Why should I train a new one on the labeled data then?
Technically, you could use our program for inference. However, best results are achieved if a Supervised Learning classifier is trained on the generated labels, as these models improve generalization.
Is the labeling limited to classification tasks?
Currently yes - We are focusing on classification for now. Next up will be Named Entity Recognition. If you need a different task, contact us.
Which data formats can be used?
Our platform is designed to work with record-oriented JSON files. As JSON however is a generic file format, we can also process other data formats such as XML files, CSV files, Excel spreadsheets and many more. If you are not sure whether your data format fits our platform, don't hesitate to reach out to us.
How fast will I get my results?
Depending on the complexity of your labeling functions, you can see immediate results within minutes.
I have less than 1,000 data records. Do I need this?
That depends. If you are not planning to collect further records in the future, we'd suggest sticking to tools like Prodi.gy to quickly label your records. However, if these 1,000 records are only the tip of the iceberg and you are going to collect further records, setting up your infrastructure using onetask is a great idea!
Will I receive personal support?
Yes! We're glad to help you setup your infrastructure and guide you in your projects if needed. Also, you will have access to our in-depth documentation and guides, helping you to make the best out of your labeling process.
My data can't leave the company. What can I do?
onetask is designed to excel in such cases. We're glad to setup our platform as an on-premises solution in-house. Your data does not have to leave the company while you can ensure scalable data labeling.
I already have some automation technique for labeling. What can I do now?
Great, this will kickstart your onetask labeling experience! If you already have something like regular expressions, keyword lookups or rules-of-thumb to label your data, you can easily embed these techniques into our platform using our SDK within minutes.
Does your solution incorporate Named Entity Recognition?
We currently focus on common classification (binary, multiclass, multilabel). NER is a high-priority feature, which we'll implement soon. If you're interested in this, drop us a note, and we'll keep you up-to-date.