There are plenty of examples on how to parse JSON files using U-SQL, but what if one of the columns of a TSV contains JSON like this example:
ID Name Vehicle
1 Joe "{""Year"":""2017"",""Make"":""Ford"",""Model"":""Fusion""}"
2 Bob "{""Year"":""2016"",""Make"":""Kia"",""Model"":""Sorento""}"
We can adapt an example of Using the JSON User Defined Functions from the Microsoft.Analytics.Formatting.Samples.Formats library in the U-SQL Examples repository on GitHub.
Once we’ve built and registered the assemblies, the script is rather straightforward.
First up, we need to reference the assemblies we just registered.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
Next, we’ll extract the raw data
@rawData =
EXTRACT Id int,
Name string,
JsonString string
FROM "/file_with_json_column.tsv"
USING Extractors.Tsv(skipFirstNRows:1);
In this next bit, we’ll use the JsonTuple function from the Microsoft.Analytics.Samples.Formats library convert the JSON string into a dictionary. Once we have the dictionary, we’ll pull values out of it.
@rawDataWithTuple =
SELECT Id,
Name,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(JsonString) AS JsonTuple
FROM @rawData;
@expandedData =
SELECT Id,
Name,
JsonTuple["Year"] AS VehicleYear,
JsonTuple["Make"] AS VehicleMake,
JsonTuple["Model"] AS VehicleModel
FROM @rawDataWithTuple;
Finally, we’ll output the expanded data so we can check our work.
OUTPUT @expandedData
TO "/expandedData.tsv"
USING Outputters.Tsv(outputHeader : true);
The full script and a sample file can be found here.
2 Responses to U-SQL – Extracting Information From a Column Containing JSON