publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- In SubmissionSpatial Consistency Enhanced Dissimilarity Coefficient based Weakly Supervised Object DetectionAditya Arun, C.V. Jawahar, and M. Pawan KumarIn Submission, 2024
We consider the problem of weakly supervised object detection, where the training samples have various types of inexpensive annotations. These annotations can indicate the presence or absence of an object category or include count, point, or scribble annotations. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, MS COCO 2014, and MS COCO 2017 data sets demonstrate the efficacy of our proposed approach.
2023
2020
- ECCVWeakly Supervised Instance Segmentation by Learning Annotation Consistent InstancesAditya Arun, C.V. Jawahar, and M. Pawan KumarIn Proceedings of European Conference on Computer Vision, 2020
Recent approaches for weakly supervised instance segmentations depend on two components: (i) a pseudo label generation model that provides instances which are consistent with a given annotation; and (ii) an instance segmentation model, which is trained in a supervised manner using the pseudo labels as ground-truth. Unlike previous approaches, we explicitly model the uncertainty in the pseudo label generation process using a conditional distribution. The samples drawn from our conditional distribution provide accurate pseudo labels due to the use of semantic class aware unary terms, boundary aware pairwise smoothness terms, and annotation aware higher order terms. Furthermore, we represent the instance segmentation model as an annotation agnostic prediction distribution. In contrast to previous methods, our representation allows us to define a joint probabilistic learning objective that minimizes the dissimilarity between the two distributions. Our approach achieves state of the art results on the PASCAL VOC 2012 data set, outperforming the best baseline by 4.2% mAP@0.5 and 4.8% mAP@0.75.
@inproceedings{arun2020weakly, title = {Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances}, author = {Arun, Aditya and Jawahar, C.V. and Kumar, M. Pawan}, booktitle = {Proceedings of European Conference on Computer Vision}, year = {2020}, }
2019
- CVPRDissimilarity Coefficient based Weakly Supervised Object DetectionAditya Arun, C.V. Jawahar, and M. Pawan KumarIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019
We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach.
@inproceedings{arun2019dissimilarity, title = {Dissimilarity Coefficient based Weakly Supervised Object Detection}, author = {Arun, Aditya and Jawahar, C.V. and Kumar, M. Pawan}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2019}, }
2018
- BMVCLearning Human Poses From ActionsAditya Arun, C.V. Jawahar, and M. Pawan KumarIn Proceedings of the British Machine Vision Conference, 2018
We consider the task of learning to estimate human pose in still images. In order to avoid the high cost of full supervision, we propose to use a diverse data set, which consists of two types of annotations: (i) a small number of images are labeled using the expensive ground-truth pose; and (ii) other images are labeled using the inexpensive action label. As action information helps narrow down the pose of a human, we argue that this approach can help reduce the cost of training without significantly affecting the accuracy. To demonstrate this we design a probabilistic framework that employs two distributions: (i) a conditional distribution to model the uncertainty over the human pose given the image and the action; and (ii) a prediction distribution, which provides the pose of an image without using any action information. We jointly estimate the parameters of the two aforementioned distributions by minimizing their dissimilarity coefficient, as measured by a task-specific loss function. During both training and testing, we only require an efficient sampling strategy for both the aforementioned distributions. This allows us to use deep probabilistic networks that are capable of providing accurate pose estimates for previously unseen images. Using the MPII data set, we show that our approach outperforms baseline methods that either do not use the diverse annotations or rely on pointwise estimates of the pose.
@inproceedings{arun2018learning, title = {Learning Human Poses From Actions}, author = {Arun, Aditya and Jawahar, C.V. and Kumar, M. Pawan}, booktitle = {Proceedings of the British Machine Vision Conference}, year = {2018}, }