Reproducing our work:

This repository hosts the code for the paper "Answer-Aware Question Generation from Tabular and Textual Data using T5".

In this project, we converted ToTTo to TabQGen(Question Generation Dataset for Tables). This is done mainly in 2 steps -

Creating a Multi-Type Question Generator from Text using SQUAD and BoolQ datasets.
Applying Multi-Type Question Generator to ToTTo descriptions and generate questions for each of them. The resulting augmented ToTTo dataset is named as TabQGen.

For detailed approach of our end-to-end pipeline and our findings, kindly refer our paper.

TabQGen datasets can be found here

Reproducing our work:

We implemented our entire pipeline with interactive Jupyter notebooks and to reproduce our work, here are the sequential notebooks to run:

TextQGen.ipynb creates the Multi-Type Question Generator from Text -> requires you to download SQUAD and BoolQ datasets.
TabQgen_dataset_creation.ipynb applies Multi-Type Question Generator to ToTTo descriptions which results TabQGen dataset. language folder is taken from this repo of Google Research for processing ToTTo data. This step outputs two types of datasets - raw TabQGen dataset which follows similar strucure as ToTTo and processed TabQGen dataset which contains questions(labels) and processed sub-table data using language repo. These can be found in datasets folder.
T5_[small | base | large]_tabqgen folders contain files for training and testing respective T5 variant using TabQGen and follow same structure as follows -
- t5_[small | base | large]_train.ipynb -> trains T5 variant with TabQGen and saves trained model in 'table_qgen' directory
- t5_[small | base | large]_test.ipynb -> tests the model from previous step and saves predictions in 'test_preds.txt' file.
- t5_[small | base | large]_scoring.ipynb -> takes the predicted results from previous step and calculates different scores as mentioned in the paper.

Models trained on TabQGen were uploaded to HuggingFace models and can be accessed using below links:

tabqgen_small -> https://huggingface.co/saichandrapandraju/t5_small_tabqgen

tabqgen_base -> https://huggingface.co/saichandrapandraju/t5_base_tabqgen

tabqgen_large -> https://huggingface.co/saichandrapandraju/t5_large_tabqgen

How to Cite

If you extend or use this work, please cite the paper :

@article{iJET25121,
	author = {Saichandra Pandraju and Sakthi Ganesh Mahalingam},
	title = {Answer-Aware Question Generation from Tabular and Textual Data using T5},
	journal = {International Journal of Emerging Technologies in Learning (iJET)},
	volume = {16},
	number = {18},
	year = {2021},
	keywords = {Question Generation, T5, Table-to-Text, Transfer Learning},
	abstract = {Automatic Question Generation (AQG) systems are applied in a myriad of domains to generate questions from sources such as documents, images, knowledge graphs to name a few. With the rising interest in such AQG systems, it is equally important to recognize structured data like tables while generating questions from documents. In this paper, we propose a single model architecture for question generation from tables along with text using “Text-to-Text Transfer Transformer” (T5) - a fully end-to-end model which does not rely on any intermediate planning steps, delexicalization, or copy mechanisms. We also present our systematic approach in modifying the ToTTo dataset, release the augmented dataset as TabQGen along with the scores achieved using T5 as a baseline to aid further research.},
	issn = {1863-0383},
	url = {https://online-journals.org/index.php/i-jet/article/view/25121},
	pages = {256--267}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
language/totto		language/totto
t5_base_tabqgen		t5_base_tabqgen
t5_large_tabqgen		t5_large_tabqgen
t5_small_tabqgen		t5_small_tabqgen
.gitignore		.gitignore
README.md		README.md
TabQgen_dataset_creation.ipynb		TabQgen_dataset_creation.ipynb
TextQGen.ipynb		TextQGen.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing our work:

How to Cite

About

Languages

saichandrapandraju/TabQGen

Folders and files

Latest commit

History

Repository files navigation

Reproducing our work:

How to Cite

About

Topics

Resources

Stars

Watchers

Forks

Languages